Working Memory and Clinical Outcome Assessments: A Comparative Load Analysis
19 dic 2025
UK
,
Spain
In the last article, we introduced the Cognitive Burden Simulator (CBS), a hybrid mindmap-syntactic tree model that visualises how patients might visually and cognitively process Clinical Outcome Assessment (COA) items. The CBS is illustrative, even metaphorical, rather than literal, but it brings to the fore and makes visible the hidden layers of effort behind seemingly simply questions.
In this article, we build on that foundation by focusing specifically on the demands of working memory (WM). Instead of walking through each item in detail, this article takes a higher-level view: mapping different types of COA items against established WM capacity limits.
Why Working Memory Matters
Working memory is the “mental workspace” used to hold and manipulate information while performing tasks. For COAs, it is the resource patients rely on when recalling timeframes, interpreting items, attributing symptoms to health conditions and mapping experiences onto response options.
The limits are strict:
Miller’s classic “7 ± 2” (1956) provided an optimistic upper bound.
Cowan’s “4 ± 1” (1990s) offered a more conservative estimate closer to the biological bottleneck.
Real-world factors: health literacy, stress, pain, fatigue, anxiety, and so on, can all reduce effect WM to 2-3 chunks.
When COA items demand more than this, patients may skip, guess or satisfice rather than answer as intended.
WM Load Across COA Items
The table below compares several COA item types, estimating their peak working memory load in “chunks” and evaluating feasibility under the real-world ceiling.
Item Type (examples) | Core Processing Demands | Est. WM Load (chunks) | Feasible under Miller (7 ± 2)? | Feasible under Cowan (4 ± 1)? | Feasible for Patients in Real World (2–3 chunks)? |
Baseline syntax (“The cat sat on the mat”) | Parse subject + predicate | 3–4 | ✅ Yes | ✅ Yes | ⚠️ Borderline if fatigued |
Binary Yes/No (“In the past week, did you experience pain?”) | Timeframe, recall, threshold decision, Yes/No mapping | 4–5 | ✅ Yes | ⚠️ At limit | ❌ Often overload |
Frequency scale (“How often anxious/worried about health?”) | Timeframe, define symptom, recall across 30 days, scale mapping | 5–6 | ⚠️ Edge | ❌ Overload | ❌ Overload |
Multi-symptom checklist (“Which emotional effects… select all”) | Recall timeframe, attribution to health, evaluate each option, multi-select | 4–5 per item, cumulative fatigue | ✅ Individually | ⚠️ At edge cumulatively | ❌ Overload |
Shared stem (3 items) (“Thinking about past 4 weeks…”) | Hold stem + domains, recall for each sub-item, map to scale, alignment | 5–7 per row | ⚠️ Edge | ❌ Overload | ❌ Overload |
Ranking task (“Select up to three symptoms and rank”) | Recall symptoms, judge disruption, select three, compare, rank | 6–8 | ⚠️ At ceiling | ❌ Overload | ❌ Overload |
Trauma (simple) (“Nightmares or flashbacks?”) | Recall timeframe, distinguish types, attribute to trauma, map to scale | 6–7 | ⚠️ At ceiling | ❌ Overload | ❌ Overload |
Trauma (hedged) (“Reflecting… distressing imagery…”) | Long syntax, reflection, examples, attribution conscious/unconscious, event qualification, mapping | 7–9 | ❌ Exceeds | ❌ Exceeds | ❌ Impossible |
Pain descriptors list (18) (“Describe pain using terms…”) | Recall pain, compare to descriptor, tick/skip, repeat ×18 | 4–5 per tick, cumulative | ✅ Per item | ⚠️ At limit cumulatively | ❌ Overload |
Abstract pain alignment (“To what extent do descriptors align…”) | Reflect on discomfort, match to descriptor, judge extent, integrate dimensions (characteristics, intensity, impact) | 6–8 | ⚠️ Edge | ❌ Overload | ❌ Overload |
Shared stem alone (“Past 4 weeks, health affects life domains”) | Hold timeframe + 4 domains + attribution | 3–4 | ✅ Yes | ✅ Yes | ⚠️ Borderline if stressed |
Key Insights
1. Binary items are not trivial: even a Yes/No pain item sits right at Cowan’s 4 ± 1 limit. For patients under stress or fatigue, this can already overload.
2. Frequency scales and checklists exceed Cowan. They rely on fuzzy thresholds (“sometimes” versus “often”) and cumulative micro-decisions.
3. Shared stems are double-edged. They reduce repetition in text but increase WM demand because the stem must be held across multiple items.
4. Ranking and trauma items push beyond Miller. These tasks are structurally unanswerable as intended for many respondents.
5. Descriptor lists create cumulative fatigue. Each micro-decision is manageable but across 18 descriptors for effort compounds.
6. Real-world erosion (2-3 chunks): health literacy, stress, anxiety, fatigue and pain shrink capacity further. Under these conditions, almost all but the simplest items are cognitive unrealistic.
Why This Matters
COA design is often treated as a matter of wording, translation or psychometric scaling. But the real bottleneck is cognitive. If an item requires more WM chunks than patients can realistically hold, the data generated will reflect coping strategies rather than lived experience.
This does not mean that patients “fail”. It means that the instrument fails the patient.
Towards Cognitive Accessibility
The CBS is one way of making these burdens visible. By mapping COA items onto WM limits, we can identify which formats are feasible, which are borderline and which are likely overwhelming.
Design implications include:
Keep items within the window of 4 ± 1 (Cowan’s number).
Use short scales wherever possible.
Be cautious with shared stems: they reduce visual load but increase WM strain.
Do not include ranking tasks and heavily hedged trauma questions.
Streamline descriptor lists to minimise fatigue.
Conclusion
Working memory is the hidden constraint in patient-reported outcomes. Miller’s 7 ± 2 suggested we could hold more than we actually can. Cowan’s 4 ± 1 shows the truer limit. Real-world erosion from stress, fatigue and lower levels of health literacy issues narrows this window further.
By situating COA items against these limits, we can see clearly where instruments break human capacity. The goal is not to lower standards of measurement but to design assessments that are cognitively assessable, so that what we measure truly reflects patients’ experiences, rather than the strain of answering.
Thank you for reading,
Mark Gibson, Madrid, Spain
Nur Ferrante Morales, Ávila, Spain
September 2025
Originally written in
English
