Cognitive Load in Clinical Outcome Assessments: Optimising Question Design for Better Data Quality
Dec 17, 2025
Mark Gibson
,
United Kingdom
Health Communication and Research Specialist
The effectiveness of Clinical Outcome Assessments (COAs) and Patient-Reported Outcome (PRO) measures depends on how well they capture reliable, meaningful and accurate data. However, poorly designed assessments can introduce cognitive overload, leading to fatigue, biased responses and inaccurate recall.
The Google Effect, Miller’s Magic Number 7 (give or take 2) and cognitive load theory highlight how limitations in human memory and attention affect response accuracy. Later research by Nelson Cowan, however, shows that Miller’s estimate was overly generous: the true working memory capacity is closer to 4 ± 1 items. This smaller bandwidth means that many COA items, even when they look “average,” already exceed natural human limits. Furthermore, PRO guidance from regulatory agencies such as FDA and EMA emphasise the importance of designing assessments that minimise respondent burden.
This article explores how cognitive overload impacts self-reported assessments and incorporates regulatory recommendations for designing clear, concise and effect COA instruments.
Understanding Cognitive Overload in PRO Measures
What is Cognitive Load?
Cognitive load refers to the mental effort required to process and respond to information. In PRO measures, cognitive overload occurs when:
· Questions contain multiple concepts, making them difficult to interpret.
· Recall periods are too long, forcing respondents to reconstruct events rather than recall specific instances.
· Too many response options lead to decision fatigue.
· Assessments are too long, causing survey fatigue and dropout rates to increase.
When patients experience cognitive overload, their answers become less reliable, introducing measurement error that affects clinical research and patient care.
The Google Effect and Memory-Based PRO Questions
The Google Effect describes how people are more likely to forget information they can look up online. This has major implications for recall-based PRO items, particularly when patients are asked to remember:
· Symptoms experienced over long periods, such as “How often have you felt fatigued in the past six months?”
· Medication adherence (e.g. “How many times did you miss a dose in the past year?”)
· Past clinical experiences (e.g. “How has your pain changed since last year?”).
Placed against Cowan’s 4 ± 1, the Google Effect becomes even more troubling. If working memory could only ever comfortably juggle around four items, digital offloading that reduces this to 2–3 slots leaves patients with almost no spare capacity for demanding recall questions.
FDA and EMA Recommendations on Recall Periods
· FDA PRO Guidance: Recommends using shorter recall periods to improve data accuracy.
· EMA HRQoL Reflection Paper: Suggests that recall bias increases over time and shorter time frames reduce memory distortion.
Therefore, the best practice solution should be to shorten recall periods.
Thus, a poor question would be: “Over the past year, how often have you felt anxious?”
A better question would be: “In the past week, how often have you felt anxious?”
This works because:
· A shorter time frame reduces recall errors
· Respondents are less likely to estimate or reconstruct past emotions inaccurately
· Minimises reliance on external sources, such as Google, medical records.
Miller’s Magic Number and the Impact of Too Many Response Options
According to Miller’s Magic Number 7 (±2), the human brain can only comfortably hold 5-9 pieces of information in short-term memory. But if we accept Cowan’s refinement, the realistic range is closer to 3–5. This has major implications: even standard 7-point Likert scales may exceed the comfort zone for many patients, especially under stress. This means that:
· Questions with too many response options may overwhelm patients.
· Likert scales should be limited to 5-7 options for better differentiation. Better still, 3–5 well-defined categories may align more closely with actual human capacity.
· Complex multiple-choice questions lead to decision fatigue.
FDA and EMA Recommendation on Response Options
· FDA Pro Guidance: Recommends using clear, concise and simple response options to prevent participant fatigue.
· FDA HRQoL Reflection Paper: Suggests that fewer, well-defined categories improve data reliability.
So, best practice would be to limit response choices in COA design.
Example of poor design: 10-point Likert scale, as there are too many options.
Example of better design: 5 to 7-point Likert scale. This brings better differentiation and less effort.
Multi-Concept Questions That Increase Cognitive Load
One of the most common issues in COAs in double-barrelled or multi-concept questions, where a single item asks about multiple experiences, symptoms or causes.
An example of high cognitive load in a COA question:
“In the past week, how often have you felt fatigued and emotionally exhausted due to work or home responsibilities?”
Problems:
· The question combines multiple symptoms, dealing with physical and emotional exhaustion.
· Blends multiple causes, i.e. work and home life.
· Participants might experience one symptom but not the other, leading to inaccurate responses.
FDA and EMA Recommendations on Single-Concept Items
FDA PRO Guidance: Each question should focus on a single concept to ensure clarity and reduce response bias.
EMA HRQoL Reflection Paper: Encourages separating distinct experiences into different questions for improved accuracy.
Best Practice: Break Down Multi-Concept Items
The above question, improved, and broken up into two questions:
“In the past week, how often have you felt physically fatigued?”
“In the past week, how often have you felt emotionally exhausted?”
Why This Works:
· Each item focuses on a single symptom.
· Participants don’t need to separate multiple experiences in their mind.
· Enhances response reliability and reduces cognitive strain.
Survey Length and Digital Assessment Considerations
PRO guidance recognises that long and complex assessments reduce engagement. In digital surveys, participants may:
· Skip difficult questions or abandon the survey if it is too long.
· Struggle with complex branching logic in electronic PROs (ePROs).
· Search for external information, leading to response bias.
FDA and EMA Recommendations on Survey Length
FDA PRO Guidance: Suggests keeping PROs concise to maintain engagement and data quality.
EMA HRQoL Reflection Paper: Highlights the need for progress indicators to reduce fatigue in longer assessments.
Best Practice: Keep Assessments Concise
· Use progress bars in digital assessments.
· Limit surveys to 10-15 minutes to maintain attention.
· Do not use complex skip logic that increases mental effort.
Final Recommendations: How to Design PRO Measures That Reduce Cognitive Load
To design PROs that align with FDA and EMA guidance while minimising cognitive overload, follow these principles:
1. Use Short Recall Periods
❌ “Over the past 6 months, how often have you had pain?”
✅ “In the past week, how often have you had pain?”
2. Avoid Multi-Concept Questions
❌ “How often have you felt anxious and had difficulty sleeping?”
✅ “How often have you felt anxious?” and then a separate question for sleep difficulties.
3. Limit Response Option to 5-6 Choices at most.
❌ Overly detailed 10-point Likert scales.
✅ A simple 5- or 7-point Likert scale.
4. Optimise Digital Assessments
❌ Overly complex skip logic.
✅ Intuitive adaptive questioning and progress indicators.
By following cognitive psychology principles and regulatory recommendations, PRO measures can be more user-friendly, improve data quality, and reduce measurement error. And by acknowledging Cowan’s 4 ± 1 as the true human limit, rather than Miller’s more optimistic 7 ± 2, developers can better calibrate assessments to match what patients can realistically process.
Thank you for reading,
Mark Gibson
Cockermouth, United Kingdom, August 25
Originally written in
English
