Article

Background Behind the Cognitive Burden Simulator: Understanding the Hidden Demands of Clinical Outcome Assessments

18 dic 2025

UK

,

Spain

Clinical Outcome Assessments (COAs) are central to both clinical trials and routine care. They promise to capture the patient’s voice: experiences of pain, fatigue, anxiety, quality of life and daily functioning. On paper, they look straightforward. A question is asked, the patient recalls their experience and an answer is ticked. Yet beneath this surface lies a hidden assumption: that patients can effortlessly read, interpret and answer these questions.

In practice, answering even a single COA item requires a series of cognitive steps: parsing the sentence, recalling relevant memories, deciding what counts as relevant, mapping that experience onto the provided scale and finally committing to a response. Each step imposes a cognitive burden. For healthy individuals, this may be manageable but for patients who are fatigued, in pain, anxious or cognitively impaired, the burden can be overwhelming.

In this article, we introduce the Cognitive Burden Simulator (CBS), a hybrid of syntactic tree diagrams and mind-mapping. The CBS is not a literal model of how the brain works, but a stylised framework designed to make visible the hidden layers of mental effort required to process COA items. We begin with the classic “the cat sat on the mat” example, then progressively map COA questions of increasing complexity.

Along the way, we situate these items in the broader context of working memory research, from Miller’s famous 7 ± 2 to Cowan’s narrower 4 ± 1. We consider how modern trends such as the Google Effect and reliance of Large Language Models (LLMs) may further erode effective cognitive capacity.

Working Memory Limits: From Miller to Cowan

In 1956, George Miller published The Magical Number Seven, Plus or Minus Two, suggesting that human working memory could hold around seven “chunks” of information at once. This became a rule of thumb in psychology, education and design: keep phone number to seven digits, don’t produce menus that that have more than seven items, and so on.

Miller himself was cautious. He believed that what counts as a “chunk” is flexible. For example, the string 1-9-4-5-2-0-2-5 can be remembered as eight digits,4 chunks (organising the numbers in pairs) or as two chunks, such as imagining them as years: “1945” and “2025”.  So, the working memory capacity can be expanded artificially by chunking, rehearsal and familiarity.

Later research, particularly by Nelson Cowan, showed that the true biological bottleneck is smaller. Cowans 4±1 model suggests that working memory is limited to about four concurrent chunks when rehearsal and chunking strategies are controlled. This means Miller’s “7 ± 2” may have overestimated capacity by nearly double.

For COAs, this difference is crucial. A simple Yes/No pain item may require four chunks:

·       Timeframe

·       Definition of pain

·       Recall of episodes

·       Mapping to Yes/No.

That already sits at Cowan’s upper limit. More complex items, such as multi-symptom checklists, shared-stem questions or trauma-related scales, can easily demand six, seven or more concurrent chunks, exceeding even Miller’s generous estimate.

Cognitive Erosion in the Modern Era

If our natural capacity is already so limited, what happens when everyday habits reshape how we use memory? Two phenomena are particularly relevant: the Google effect and our reliance on large language models.

The Google effect (Sparrow, Liu & Wegner, 2011) showed that people are less likely to remember information if they expect to be able to look it up later. Instead of remembering facts, people remember where to find them. In effect, memory becomes externalised. This does not necessarily weaken memory itself but it reallocates it toward search and retrieval strategies.

With LLMs, this outsourcing goes even further. We no longer only offload facts; we outsource synthesis, summarisation and comparison. Why hold six items in working memory when an LLM can instantly summarise them? The risk is that people gradually under-practice chunking, integration and sustained comparison, which are the very skills needed to complete demanding COA items.

For patients under stress, fatigue, pain or cognitive impairment, the combination is even harsher. Effective working memory capacity may be closer to 2-3 chunks in real-world conditions. That is below Cowand’s and far from Miller’s numbers. This means that many COA items, as they are currently written, may simply become unanswerable in the way developers intend.

Mapping the Burden in COA items

To illustrate this, let’s walk through different COA item formats, using the CBS to show the mental work required.

Simple Binary Items

“In the past week, did you experience any pain? Yes/No”

·       Easy in the sense of: familiar words, bounded timeframe, binary choice.

·       Difficult in the sense of: recall across seven days, threshold decision (“does a mild headache count?”)

·       Working Memory (WM) load: 4-5 chunks:. This is manageable for Miller but at Cowan’s limit.

·       Implication for COAs: patients under stress or fatigue may already be overloaded.

Frequency Scales

“Over the past month, how often have you felt anxious or worried about your health? Never / Rarely / Sometimes / Often / Always.”

·       Easy in the sense of: recognisable emotional terms, standard Likert scale.

·       Difficult in the sense of: recall across 30 days, fuzzy cut-offs between “sometimes”, “often”, increased possibility of self-monitoring bias.

·       WM load: 5-6 chunks = above Cowan, closer to Miller.

·       Implication for COA: at risk of overload, especially for anxious patients.

Multi-Symptom Checklists

“Over the past month, which of the following emotional effects have you experienced as a result of your health condition? Select all that apply: increased anxiety, frustration, loneliness, depression, reduced confidence, none of the above.”

·       Easy in the sense of: recognition (simple tick boxes) is easier than free recall.

·       Difficult: attribution to health condition, overlapping categories, repeated micro-decisions, final contradiction check with “None of the above”.

·       WM load: 4-5 chunks per option: cumulative fatigue across the list.

·       Implication for COA: decision fatigue and ‘satisficing’ (meaning ticking the obvious ones, skipping the rest).

Shared-Stem Items

“Thinking about the past four weeks, consider how your health condition has affected physical activity, cognitive function, emotional well-being, and social interactions. For each of the following, indicate how often you experienced these difficulties…”

·       Easy in the sense of: stem reduces repetition, consistent scale across items.

·       Difficult in the sense of: stem must be held in working memory across all sub-items; attribution to health; scale alignment across rows.

·       WM load: 5-7 chunks per row.

·       Implication for COA: while efficient for survey design, cognitively harder for patients than repeating simpler stand-alone items.

Ranking Tasks

“In the past month, which symptoms caused the most difficulty? Select up to three and rank them 1–3 (1 = most impact, 3 = least impact).”

·       Easy in the sense of: recognisable symptom terms.

·       Difficult in the sense of: recall all symptoms; decide which count as “significant”; select three of them; compare relative impact; assign ranks.

·       WM load 6-8 chunks: far exceeds Cowan and strains Miller.

Trauma-Linked Items

“In the past month, how often have you had nightmares or flashbacks related to a traumatic event?”

·       Easy in the sense of: timeframe clear, terms recognisable.

·       Difficult in the sense of: distinguishing nightmares from bad dreams; flashbacks from ordinary memories; attributing to trauma; emotional interference.

·       WM load: 6-7 chunks.

·       Implication: emotional arousal further reduces WM capacity so effective overload.

The hedged version is far worse:

“Reflecting on your experiences over the past month, to what extent, if at all, have you encountered involuntary, distressing mental imagery—such as nightmares or intrusive flashbacks—that appear to be linked, whether consciously recognised or not, to an event you perceive as distressing or traumatic?”

·       WM load: 7-9 chunks simultaneously (timeframe, reflection, mental imagery construct, examples, attribution to trauma, conscious versus unconscious link, “extent” mapping).

·       Implication for COA: unmanageable even for healthy respondents; impossible for patients under stress.

Descriptor Lists

“Describe your pain using the following terms: aching, stabbing, burning, cramping, crushing, shooting, tingling, radiating, numb, dull, sharp, gnawing, heavy, throbbing, cutting, pressing, exhausting, sickening, punishing.”

·       Easy in the sense of: recognition-based; familiar sensory words.

·       Difficult in the sense of: volume (18 options), overlapping terms, emotional valence (punishing, sickening), immense translation challenges.

·       WM load: 4-5 chunks per tick, but cumulative fatigue across 18 terms.

·       Implication for COA: risk of under-reporting due to fatigue or avoidance.

Real-World Factors That Reduce Effective WM

All of the above estimates assume a quiet, well-rested, literate respondent. Real patients rarely fit that description.

·       Health literacy: increases decoding effort, with words such as “intermittent”, “functioning”, reducing mental resourcing for recall.

·       Anxiety and stress: consume attentional bandwidth, effectively cutting WM capacity by 1-2 chunks.

·       Pain and fatigue: directly impair WM by hijacking attentional systems.

·       Time constraints: shorten rehearsal and consolidation; encourage satisficing.

·       Unfamiliarity: first-time respondents must decode both instructions and content.

In real-world conditions, effective WM capacity may shrink to 2-3 chunks. At that point, even seemingly easy Yes/No items can overload, especially when repeated across a long form.

What Is the Point of the Cognitive Burden Simulator?

We created the Cognitive Burden Simulator (CBS) to visualise this hidden load. It merges old-school Chomskyan syntactic trees (sentence structure) with mindmaps (branching cognitive processes). The CBS shows, in stylised form, the layers of parsing, recall, attribution, quantification and decision-making involved in answering COA items.

It is important to stress that the CBS is not a literal cognitive replica. A true replica would look like a messy, dynamic simulation of eye-tracking heatmaps, brainwave timeliness and working memory fluctuations. All this would be difficult to capture in one visualisation because it is far less transparent than parsing and mind-mapping, although it would be closer to reality.

The CBS choose clarity over realism. In many respects, it is a metaphor. By showing how even a simple Yes/No item fills Cowan’s working memory window and how compelx trauma or ranking items blow past Miller’s upper bound, the simulator highlights just how fragile the design space is for COAs.

Conclusion

Clinical Outcome Assessments are not neutral measurement tools. They are linguistic and cognitive artefacts that place real demands on patients. The true biological bottleneck for our working memory is narrower than we tend to think – and how developers of COA tend to think. Add in the Google effect, LLM overreliance and the lived realities of patients facing anxiety, fatigue, pain, low literacy, time pressure and the effective capacity may shrink to less than 2-3 chunks.

Against this backdrop, COA items that assume patients can recall a month of symptoms, categorise them, attribute them to a health condition and rank their severity are cognitively unrealistic. The CBS is a way to make these hidden burdens visible. By starting with “the cat sat on the mat” and ending with multi-symptom, trauma-linked items, the CBS shows where items tip from feasible to overwhelming.

If we want COAs to truly capture the patient’s voice, they must be designed not only for linguistic clarity but also for cognitive accessibility. Otherwise, the answers we collect will reflect not just the patient’s condition but the limits of their working memory under duress.


Thank you for reading,


Mark Gibson, Leeds, United Kingdom

Nur Ferrante Morales, Ávila, Spain

September 2025

Originally written in

English