Diagnostic Testing & Workup
Psychiatry has no validated diagnostic biomarker — diagnoses rest on clinical assessment, not a blood test or scan. Lab and biomarker testing earns its place by detecting reversible contributors, guiding safe prescribing, and stratifying treatment rather than confirming a disorder.
Medically reviewed · Last updated June 2026 · 48 min read
Contents
- 1Why psychiatry still has no validated diagnostic test
- 2Inflammatory and Immune Biomarkers in Psychiatry
- 3Neuroimaging Biomarkers in Psychiatry
- 4Genetic and Pharmacogenomic Testing in Psychiatry
- 5Electrophysiological Biomarkers in Psychiatry
- 6Neuroendocrine and Metabolic Biomarkers in Psychiatry
- 7Digital Phenotyping and Computational Biomarkers in Psychiatry
Why psychiatry still has no validated diagnostic test
The question the series has been circling
Every document in this series ends in roughly the same place: a real, replicated, group-level signal that nonetheless fails as a clinical test. Inflammatory markers, neuroimaging, genetics, electrophysiology, neuroendocrine and metabolic measures, digital phenotyping — each is mechanistically grounded, each has decades or at least years of supporting data, and not one of them can diagnose a psychiatric disorder in the individual patient sitting across from a clinician. After half a century of effort and enormous expenditure, psychiatry remains the major branch of medicine without a single biomarker that confirms or excludes its core diagnoses.
This capstone argues that this is not primarily a technical failure waiting on better machines or bigger datasets. It is a category problem, and understanding it changes what we should expect biomarkers to do, where we should look, and which current claims deserve skepticism. The individual documents establish the facts; this one explains the pattern they make.
The diagnostic-validity gap
The foundational obstacle is that biomarkers are validated against diagnoses, and psychiatric diagnoses are of uncertain biological validity. Robins and Guze (1970) laid out the criteria by which a psychiatric diagnosis earns validity — characteristic features, course, family aggregation, laboratory correlates, delimitation from other disorders. Kendell and Jablensky (2003) later drew the sharp distinction the field had blurred: DSM categories possess utility (they organize communication and treatment) without established validity (they do not necessarily carve nature at its joints).
This matters concretely. A diagnosis like major depressive disorder is polythetic and profoundly heterogeneous: the criteria can be met by hundreds of distinct symptom combinations, and two patients carrying the same diagnosis may share almost no symptoms (Fried and Nesse, 2015). A single biomarker cannot map cleanly onto a target that is, biologically, a federation of conditions. Kapur, Phillips, and Insel (2012) named this directly in asking why biological psychiatry has produced so few clinical tests: the field has been validating candidate markers against a gold standard — the DSM diagnosis — that is itself not biologically valid, a circularity that caps how well any marker can ever perform. A test can be no more coherent than the construct it is asked to detect.
This single point reframes the entire series. The "failure" of each biomarker class is partly the expected consequence of pointing good measurement tools at a poorly specified target.
The recurring methodological lesson
Layered on top of the category problem is a methodological one, and it recurs so consistently across the series that it is the most useful single lesson for a clinical reader. The pattern: a high-profile result emerges from a modest sample, is celebrated as a breakthrough, and then fails to replicate in larger, independent data.
The dexamethasone suppression test was the neuroendocrine prototype in the 1980s. The 5-HTTLPR-by-stress interaction (Caspi, 2003) was overturned by Border and colleagues (2019), as detailed in the genetic document. The connectivity biotypes of Drysdale and colleagues (2017) did not survive reanalysis by Dinga and colleagues (2019), per the neuroimaging document. The EEG predictive literature was deflated by Widge and colleagues (2019), per the electrophysiology document. And Marek and colleagues (2022) quantified the general principle: reproducible brain–behavior associations require sample sizes orders of magnitude larger than the field has typically used, because true effects are small and small samples inflate them.
The mechanism is the same each time and is not specific to psychiatry — it is the winner's curse described by Ioannidis (2005): small samples plus high analytic flexibility plus publication bias generate exciting findings that are mostly noise. The practical takeaway is a posture: treat any single-study biomarker claim, especially one with a small sample and a commercial sponsor, as a hypothesis rather than a finding. The corrective is already underway — large consortia (ENIGMA, the Psychiatric Genomics Consortium), preregistration, mandatory out-of-sample validation, and adequately powered prospective trials — but it requires patience the field's enthusiasm has historically lacked.
Where biomarkers already earn their place
None of this means biomarkers are clinically useless today. It means their legitimate role has been misnamed. The error is conflating two very different jobs: diagnosing the disorder (which no marker can do) and detecting treatable contributors and guiding safe prescribing (which several do well).
This second role is real and routine. Thyroid function, vitamin B12, and vitamin D identify reversible contributors to depressive presentations — checking them is sound medicine precisely because a positive result changes management toward a treatable cause. Pharmacokinetic genotyping of CYP2D6 and CYP2C19 has defensible guideline support for dosing, and HLA safety alleles (such as HLA-B*15
for carbamazepine) prevent serious harm. These markers do not diagnose anything; they detect contributors and protect against toxicity. A disciplined clinician orders them for those purposes and treats the aspirational diagnostic and combinatorial-prediction claims — discussed and qualified in the genetic and electrophysiology documents — as not yet ready. Keeping these two roles separate dissolves much of the apparent paradox of a field that both lacks biomarkers and uses laboratory tests every day.Stratification, not diagnosis, is the credible path
If the diagnostic dream is built on an invalid target, the productive ambition is more modest and more achievable: stratification. Rather than asking a marker to confirm "depression," ask it to predict which treatment will work, or to define a mechanistically coherent subgroup within the heterogeneous category. This is a question biomarkers can actually answer, because it does not require the diagnosis to be biologically unified — only the subgroup.
The series' best example is the immunometabolic, anhedonic subtype, and its credibility comes precisely from convergence across biomarker classes. The high-CRP group from the inflammatory document, the metabolic-marker and atypical-neurovegetative profile from the neuroendocrine and metabolic document (Milaneschi and colleagues), the reward-circuit connectivity changes from the neuroimaging document, and the anhedonia/psychomotor signatures detectable by electrophysiology and digital phenotyping all appear to describe the same patients. When independent measurement methods triangulate on one group, that convergence is far more trustworthy than any single marker — and it is exactly the kind of structure the single-modality biotype failures lacked.
The proof of concept that stratification can change treatment is the infliximab result (Raison and colleagues, 2013): an average-null trial concealing a real benefit in the high-inflammation subgroup. The general principle it illustrates — that biomarker-stratified designs reveal effects that whole-sample trials average away — is where the field's energy is most rationally spent, and multimodal composite models (as in the EMBARC program) are more likely to succeed than any solitary biomarker.
Rebuilding the targets
If the target is the problem, some of the most important work is reconstructing the targets themselves. Three programs attempt this. The Research Domain Criteria framework (Insel and colleagues, 2010; Cuthbert and Insel, 2013) proposes setting DSM categories aside for research purposes and studying dimensions — reward, threat, cognitive control — across diagnoses, anchored to circuits and biology, on the bet that markers will map onto mechanisms rather than categories. Normative modeling (Marquand and colleagues, 2019) sidesteps the heterogeneity problem differently, locating an individual on a population distribution for each measure and quantifying deviation rather than forcing a binary diagnosis. Dimensional nosologies such as HiTOP (Kotov and colleagues, 2017) reorganize psychopathology hierarchically rather than categorically.
Each has limitations — RDoC is a research framework, not a clinical nosology, and has been criticized for reductionism and for sidelining the social and contextual dimensions of suffering — but together they represent the field's recognition that better biomarkers may require better constructs first, not the other way around.
Honest limits and what would change the field
Two limits are permanent rather than provisional. First, psychiatric phenomena are partly irreducibly psychological and social; a biomarker will never fully capture meaning, narrative, or context, and the levels-of-explanation argument developed in the depression etiology capstone applies here — a valid biological correlate does not displace the personal and relational levels at which psychiatric illness is also real. Second, much of what biomarkers measure is bidirectional, as much consequence as cause, which limits how far any single marker can anchor a diagnosis.
There is also a deeper structural prediction worth stating plainly. This library's account of causation is a web, not a list, with two convergence points — chronic stress and allostatic load upstream and neuroplasticity/BDNF downstream — between which the many specific pathways are bracketed. If that picture is right, then no single peripheral marker of any one upstream pathway should be expected to be diagnostic, because the pathways are partly interchangeable routes to shared convergence nodes. This reframes the absence of a single diagnostic biomarker as a prediction confirmed rather than a disappointment, and it suggests the most informative future markers may be those that index the convergence nodes themselves — cumulative stress exposure (hair cortisol, allostatic-load composites) and plasticity capacity — rather than any individual mechanism.
What would actually move the field is therefore reasonably clear: better targets (validated subtypes and dimensions in place of heterogeneous categories); adequately powered, preregistered, prospectively validated biomarker-stratified trials; multimodal composites over single markers; a shift toward markers that are causal and modifiable, so they guide treatment rather than merely label; and honest regulation of the commercial claims that have repeatedly run ahead of the evidence.
Bottom line
Psychiatry lacks a validated diagnostic biomarker not because measurement has failed but because the targets — DSM diagnoses — are biologically heterogeneous constructs that no single marker can coherently detect, and because the literature has been distorted by an underpowered, replication-poor research culture. The mature response is threefold. Use biomarkers where they already earn their place: detecting reversible contributors and guiding safe, individualized dosing. Pursue stratification rather than diagnosis, with the convergent immunometabolic subtype as the leading near-term target and biomarker-stratified trial designs as the method. And rebuild the constructs — through dimensional and normative approaches — so that future markers have something coherent to map onto. The disciplined stance toward the whole field is neither the credulity of the commercial pitch nor the cynicism of those who declare biology irrelevant, but a calibrated optimism: skeptical of single-study and proprietary claims, genuinely hopeful about stratified, multimodal, mechanism-anchored programs, and clear-eyed that the path to biomarker-guided psychiatry runs through better questions, not just better instruments.
Selected references
- Robins E, Guze SB. Establishment of diagnostic validity in psychiatric illness. Am J Psychiatry. 1970.
- Kendell R, Jablensky A. Distinguishing between the validity and utility of psychiatric diagnoses. Am J Psychiatry. 2003.
- Kapur S, Phillips AG, Insel TR. Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Mol Psychiatry. 2012.
- Fried EI, Nesse RM. Depression is not a consistent syndrome: an investigation of unique symptom patterns in the STAR*D study. J Affect Disord. 2015.
- Insel T, et al. Research Domain Criteria (RDoC): toward a new classification framework for research on mental disorders. Am J Psychiatry. 2010.
- Cuthbert BN, Insel TR. Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Med. 2013.
- Kotov R, et al. The Hierarchical Taxonomy of Psychopathology (HiTOP). J Abnorm Psychol. 2017.
- Marquand AF, et al. Conceptualizing mental disorders as deviations from normative functioning. Mol Psychiatry. 2019.
- Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005.
- Marek S, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022.
- Border R, et al. No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression. Am J Psychiatry. 2019.
- Drysdale AT, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med. 2017.
- Dinga R, et al. Evaluating the evidence for biotypes of depression. NeuroImage Clin. 2019.
- Widge AS, et al. Electroencephalographic biomarkers for treatment response prediction in major depressive illness: a meta-analysis. Am J Psychiatry. 2019.
- Raison CL, et al. A randomized controlled trial of the TNF antagonist infliximab for treatment-resistant depression. JAMA Psychiatry. 2013.
- Milaneschi Y, et al. Depression heterogeneity and its biological underpinnings: immunometabolic depression. Biol Psychiatry. 2020.
- Abi-Dargham A, Horga G. The search for imaging biomarkers in psychiatric disorders. Nat Med. 2016.
- Fernandes BS, et al. The new field of 'precision psychiatry'. BMC Med. 2017.
- Carvalho AF, et al. Evidence-based umbrella review of 162 peripheral biomarkers for major mental disorders. Transl Psychiatry. 2020.
- Scarr E, et al. Biomarkers for psychiatry: the journey from fantasy to reality. Mol Diagn Ther. 2015.
Inflammatory and Immune Biomarkers in Psychiatry
Diagnostics & Biomarkers series
The proposition
If a subset of mood disorder is, in some causal sense, an inflammatory illness, then peripheral markers of inflammation ought to do three things: distinguish that subset from the rest, track with its symptoms, and predict who responds to what. Of all the candidate biomarkers in psychiatry, inflammatory markers come closest to delivering on the first two and have produced the field's single most provocative stratification result on the third. They are also cheap, ubiquitous in general medicine, and mechanistically legible — which is precisely why they have become the template for what a usable psychiatric biomarker might look like, and a useful case study in why we still don't have one.
The evidence
The association between depression and peripheral inflammation is among the most replicated findings in biological psychiatry. Meta-analyses spanning two decades — Howren and colleagues (2009), Dowlati and colleagues (2010), Haapakoski and colleagues (2015), and the large pooled analysis by Osimo and colleagues (2020) — converge on modestly elevated interleukin-6 (IL-6), tumor necrosis factor-alpha (TNF-α), and C-reactive protein (CRP) in depressed cohorts relative to controls. The effect sizes are small (standardized mean differences typically 0.1–0.5), and heterogeneity is high, but the direction is consistent across designs.
The clinically interesting structure emerges when the distribution is examined rather than the mean. Roughly a quarter to a third of depressed patients show CRP above 3 mg/L, the threshold general medicine uses for "low-grade systemic inflammation." This is the empirical basis for the proposed inflammatory subtype — not a claim that all depression is inflammatory, but that a sizable minority is, and that this minority may be biologically and therapeutically distinct.
Two further observations make the case more than correlational. Longitudinal data (Khandaker and colleagues, 2014) show that elevated childhood IL-6 predicts later depressive episodes, establishing temporal precedence. And Mendelian randomization studies of the IL-6 receptor gene point toward a causal contribution of IL-6 signaling to depression risk rather than purely reactive elevation, though these analyses carry their own assumptions and should be read cautiously.
The mechanism
The peripheral signal matters only if it reaches the brain and does something there, and the pathways are now reasonably well characterized. Pro-inflammatory cytokines influence the central nervous system through humoral routes (circumventricular organs, active transport), neural routes (vagal afferents), and induction of central immune signaling. Once central, the best-supported downstream consequence runs through the kynurenine pathway: inflammation activates indoleamine 2,3-dioxygenase (IDO), shunting tryptophan away from serotonin synthesis and toward kynurenine metabolites — including quinolinic acid, an NMDA receptor agonist with neurotoxic potential, at the expense of neuroprotective kynurenic acid.
A second mechanism, developed extensively by Felger and Miller, links inflammation to dopamine. Cytokines reduce dopamine synthesis and release in the basal ganglia, and the symptoms most tightly coupled to elevated CRP — anhedonia, psychomotor slowing, fatigue, reduced motivation — are dopaminergic in character rather than the cognitive-affective symptoms of classic depression. This is a recurring theme worth flagging for the Dopaminergic Reward and Anhedonia account: the inflammatory subtype may be, functionally, an anhedonic/motivational subtype.
Inflammation also suppresses BDNF and impairs hippocampal neurogenesis, connecting it directly to the Neuroplasticity/BDNF convergence that runs through this entire library, and elevated cytokines are a well-documented downstream effect of HPA-axis dysregulation and chronic stress.
The stratification result
The result that turned inflammatory markers from an etiologic curiosity into a biomarker candidate is Raison and colleagues' 2013 trial of infliximab, a TNF antagonist, in treatment-resistant depression. Across the whole sample, infliximab was no better than placebo — a clean negative trial. But in the prespecified subgroup with baseline high-sensitivity CRP above 5 mg/L, the anti-inflammatory outperformed placebo, while in low-CRP patients it did slightly worse. This is the field's clearest demonstration of a biomarker-defined treatment interaction: the average effect was null because two opposite effects cancelled.
A parallel signal comes from antidepressant selection. Analyses of the GENDEP cohort (Uher and colleagues, 2014) suggested that higher baseline CRP predicted better response to the noradrenergic agent nortriptyline and worse response to the serotonergic escitalopram — hinting that inflammatory status might guide first-line drug class. Meta-analyses of adjunctive anti-inflammatory agents (Köhler and colleagues; celecoxib studies; mixed minocycline data) are positive on balance but limited by small samples, heterogeneity, and probable publication bias.
Clinical correlates and current standing
The honest summary: no major guideline yet recommends inflammatory markers for diagnosis or treatment selection, and none should until the stratification findings are prospectively replicated in adequately powered, biomarker-stratified trials — the design infliximab pointed toward but did not itself constitute. CRP's appeal is pragmatic — it is inexpensive, standardized, and already on most lab menus — but appeal is not validation.
The convergence
Inflammation sits at a busy intersection in this library. Upstream, it is one of the principal effectors of chronic stress and allostatic load and overlaps heavily with metabolic dysfunction — which is also its chief confounder. Downstream, it converges on the same final common pathways as everything else: reduced monoaminergic and especially dopaminergic tone, kynurenine-mediated glutamatergic disturbance, and suppressed neuroplasticity. As a biomarker, its companion documents are neuroimaging (TSPO-PET measures central neuroinflammation directly) and the synthesis in the series capstone.
Caveats — load-bearing, not decorative
The central problem is specificity. CRP rises with obesity, smoking, age, metabolic syndrome, infection, and physical inactivity — all of which travel with depression. Much of the depression–CRP association is attenuated, though not eliminated, by adjustment for body mass index (Chamberlain and colleagues, 2019; genetic analyses by Pitharouli and colleagues, 2021). An "inflammatory subtype" defined by CRP may substantially overlap with a metabolic subtype, and disentangling cause, consequence, and shared confound is unresolved.
Three further caveats: peripheral cytokines correlate only imperfectly with central neuroinflammation, so blood is a noisy proxy for brain; inflammatory markers are state-sensitive and fluctuate, complicating their use as stable trait indicators; and no validated, guideline-endorsed cutoff exists — the 3 mg/L and 5 mg/L thresholds are borrowed from cardiovascular medicine and the trial literature, not derived for psychiatric use. Finally, the effect sizes remain group-level. Knowing a population's mean CRP is elevated tells you little about an individual in front of you.
Bottom line
Inflammatory and immune markers are the most mature biomarker program in psychiatry and the clearest proof of concept that stratification is achievable: the infliximab result shows a biomarker-defined treatment interaction that average-effect trials would miss. They are not yet ready for clinical decision-making, held back by poor specificity (especially the metabolic confound), state sensitivity, the absence of validated cutoffs, and a stratification hypothesis that has been demonstrated but not prospectively confirmed. They are best understood as the field's leading template for biomarker-guided care — instructive precisely because it shows both how a usable biomarker would behave and how far the current candidates fall short.
Selected references
- Howren MB, Lamkin DM, Suls J. Associations of depression with CRP, IL-1, and IL-6: a meta-analysis. Psychosom Med. 2009.
- Dowlati Y, et al. A meta-analysis of cytokines in major depression. Biol Psychiatry. 2010.
- Haapakoski R, et al. Cumulative meta-analysis of interleukins 6 and 1β, TNF-α and CRP in patients with MDD. Brain Behav Immun. 2015.
- Osimo EF, et al. Inflammatory markers in depression: a meta-analysis of mean differences and variability. Brain Behav Immun. 2020.
- Raison CL, et al. A randomized controlled trial of the TNF antagonist infliximab for treatment-resistant depression. JAMA Psychiatry. 2013.
- Miller AH, Raison CL. The role of inflammation in depression: from evolutionary imperative to modern treatment target. Nat Rev Immunol. 2016.
- Felger JC, Miller AH. Cytokine effects on the basal ganglia and dopamine function. Front Neuroendocrinol. 2012.
- Felger JC, et al. Inflammation is associated with decreased functional connectivity within corticostriatal reward circuitry in depression. Mol Psychiatry. 2016.
- Uher R, et al. An inflammatory biomarker as a differential predictor of outcome of depression treatment with escitalopram and nortriptyline. Am J Psychiatry. 2014.
- Köhler O, et al. Effect of anti-inflammatory treatment on depression: a systematic review and meta-analysis. JAMA Psychiatry. 2014.
- Khandaker GM, et al. Association of serum interleukin 6 and CRP in childhood with depression and psychosis in young adult life. JAMA Psychiatry. 2014.
- Chamberlain SR, et al. Treatment-resistant depression and peripheral C-reactive protein. Br J Psychiatry. 2019.
- Pitharouli MC, et al. Elevated C-reactive protein in patients with depression, independent of genetic, health, and psychosocial factors. Am J Psychiatry. 2021.
- Beurel E, Toups M, Nemeroff CB. The bidirectional relationship of depression and inflammation. Neuron. 2020.
- Wittenberg GM, et al. Effects of immunomodulatory drugs on depressive symptoms: a meta-analysis. Mol Psychiatry. 2020.
- Setiawan E, et al. Role of translocator protein density (microglial activation) in major depressive episodes. JAMA Psychiatry. 2015.
- Hannestad J, DellaGioia N, Bloch M. The effect of antidepressant medication treatment on serum levels of inflammatory cytokines: a meta-analysis. Neuropsychopharmacology. 2011.
- Khandaker GM, et al. Shared mechanisms between coronary heart disease and depression: the inflammation hypothesis. Mol Psychiatry. 2017.
- Drevets WC, et al. Immune targets for therapeutic development in depression. Nat Rev Drug Discov. 2022.
- Bullmore E. The Inflamed Mind. 2018.
Neuroimaging Biomarkers in Psychiatry
Diagnostics & Biomarkers series
The proposition
If psychiatric disorders are disorders of brain circuits, then imaging those circuits ought to yield biomarkers — measurements that diagnose, subtype, or predict response where symptom checklists cannot. Three decades of neuroimaging have produced an enormous, reproducible body of group-level findings and a sobering record at the only level that matters clinically: the individual scan. The gap between those two is the most important fact in this document, and the recent reckoning over how large samples must be to close it has reshaped the entire field's expectations.
The evidence: structure
The most robust structural finding in depression is reduced hippocampal volume, established at scale by the ENIGMA-MDD consortium's analyses of thousands of patients (Schmaal and colleagues, 2016). The effect is small, concentrated in recurrent and chronic illness, and consistent with the neuroplasticity and BDNF account of stress-related hippocampal atrophy and impaired neurogenesis. ENIGMA also documented widespread, subtle cortical thinning (Schmaal and colleagues, 2017). Amygdala volume findings have been inconsistent — a useful reminder that even well-studied regions do not yield stable structural markers.
The evidence: function and connectivity
Functionally, the most influential single target is the subgenual anterior cingulate cortex (sgACC, Brodmann area 25). Mayberg's work tied sgACC hyperactivity to depressed states and its normalization to recovery, which directly motivated sgACC as a deep brain stimulation target (a thread the planned Interventional and Neurostimulation series will pick up). At the network level, meta-analyses (Kaiser and colleagues, 2015; Mulders and colleagues, 2015) describe a recurring frontolimbic signature: elevated default mode network connectivity, linked to rumination, alongside reduced frontoparietal control-network engagement — the imaging correlate of the DMN circuit dysfunction account.
For treatment prediction, the best-developed marker is rostral/pregenual anterior cingulate activity, where elevated resting theta or metabolic activity predicts antidepressant response (Pizzagalli and colleagues). The EMBARC trial deployed this prospectively, combining EEG-derived rACC signal with reward-circuit fMRI to predict differential response to sertraline versus placebo — among the more rigorous attempts to move an imaging marker from association to prediction.
The cautionary tale: connectivity biotypes
No result better captures both the promise and the peril of this field than Drysdale and colleagues (2017). Using resting-state functional connectivity, the authors reported four "biotypes" of depression that mapped onto distinct symptom profiles and, strikingly, predicted response to transcranial magnetic stimulation. The paper was high-profile and widely cited as proof that data-driven imaging could carve depression at its joints.
It did not replicate. Independent reanalysis (Dinga and colleagues, 2019) found that the clustering was not robust — the apparent biotype structure was sensitive to analytic choices and did not survive in independent data, and subsequent replication attempts have been unconvincing. This episode is load-bearing for the entire library, not just this document: it is the imaging counterpart to the candidate-gene reckoning in genetic and pharmacogenomic testing, and it is why the Capstone Synthesis treats clean biological subtypes as a hypothesis the data have so far declined to confirm. The lesson is not that biotypes are impossible but that small-sample, high-dimensional clustering generates seductive structure that vanishes under replication.
The reckoning: how big must samples be?
The deeper problem was quantified by Marek and colleagues (2022). Analyzing brain-wide association studies, they showed that reproducible brain–behavior correlations from MRI require thousands of participants, not the dozens-to-hundreds typical of the field — because true effect sizes are tiny and small samples inflate them dramatically, producing findings that fail to replicate. Compounding this, the test–retest reliability of many task-fMRI measures is poor (Elliott and colleagues, 2020), meaning the same person scanned twice may yield substantially different values — fatal for an individual-level biomarker.
Together these papers explain the structure-versus-individual gap: the group-level findings are real but small, and the analytic culture that produced exciting individual-prediction claims was systematically underpowered.
PET and molecular imaging
Positron emission tomography offers more mechanistically specific markers but remains research-grade. Serotonin and dopamine receptor and transporter imaging has informed the monoaminergic and serotonergic pharmacology accounts without yielding a diagnostic test. The most direct in-vivo measure of central neuroinflammation is TSPO-PET (translocator protein), elevated in depressive episodes (Setiawan and colleagues, 2015) — a finding that complements the peripheral markers in the inflammatory and immune document, though TSPO binding is itself confounded by a common genetic polymorphism and questions of cellular specificity. PET's cost and radiation exposure preclude routine clinical use.
The convergence
Neuroimaging is where this library's two convergence points become visible rather than inferred. The hippocampal and prefrontal structural findings are the anatomical signature of the neuroplasticity/BDNF downstream hub; the frontolimbic functional signature is the circuit-level readout of chronic stress acting upstream. As a biomarker modality, imaging's natural partners are the inflammatory markers (TSPO-PET) and electrophysiology (rACC theta bridges EEG and fMRI), synthesized in the series capstone.
Caveats — load-bearing, not decorative
Four caveats define the field's current standing. First, effect sizes are small and group-level: a reproducible mean difference across thousands of patients does not translate into a usable individual classifier. Second, reproducibility was historically poor for reasons now understood — underpowered samples inflating effects, flexible analytic pipelines, and low measurement reliability. Third, nothing is diagnostically validated: no neuroimaging test is cleared for diagnosing or subtyping any common psychiatric disorder, and clinical MRI in psychiatry remains a tool for excluding structural pathology, not for making positive diagnoses. Fourth, cost and access constrain even validated markers to specialist and research settings.
The constructive counterpoint is that the field's failures have produced its corrective. Large consortia (ENIGMA), normative modeling approaches that locate an individual against a population distribution rather than a binary diagnosis (Marquand and colleagues), and the new sample-size standards set by the brain-wide association literature represent a methodologically chastened path toward markers that might actually replicate.
Bottom line
Neuroimaging has delivered a deep, reproducible map of the brain changes that accompany psychiatric illness at the group level, and it visualizes this library's core convergence themes more directly than any other modality. It has not delivered a clinically usable individual biomarker, and the recent quantification of why — tiny true effects, low measurement reliability, and a legacy of underpowered studies — has been clarifying rather than merely deflating. The Drysdale biotype episode stands as the field's defining cautionary tale. The realistic near-term value of imaging is in stratified research and normative modeling, not in the clinic, and claims of imaging-based diagnosis or subtyping should be met with the skepticism that the replication record has earned.
Selected references
- Schmaal L, et al. (ENIGMA-MDD). Subcortical brain alterations in major depressive disorder. Mol Psychiatry. 2016.
- Schmaal L, et al. (ENIGMA-MDD). Cortical abnormalities in adults and adolescents with major depression. Mol Psychiatry. 2017.
- Mayberg HS, et al. Deep brain stimulation for treatment-resistant depression. Neuron. 2005.
- Drysdale AT, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nat Med. 2017.
- Dinga R, et al. Evaluating the evidence for biotypes of depression: methodological replication and extension. NeuroImage Clin. 2019.
- Pizzagalli DA, et al. Pretreatment rostral anterior cingulate cortex theta activity in relation to symptom improvement in depression (EMBARC). JAMA Psychiatry. 2018.
- Marek S, et al. Reproducible brain-wide association studies require thousands of individuals. Nature. 2022.
- Elliott ML, et al. What is the test-retest reliability of common task-functional MRI measures? Psychol Sci. 2020.
- Kaiser RH, et al. Large-scale network dysfunction in major depressive disorder: meta-analysis of resting-state functional connectivity. JAMA Psychiatry. 2015.
- Mulders PC, et al. Resting-state functional connectivity in major depressive disorder: a review. Neurosci Biobehav Rev. 2015.
- Setiawan E, et al. Role of translocator protein density in major depressive episodes. JAMA Psychiatry. 2015.
- Williams LM. Precision psychiatry: a neural circuit taxonomy for depression and anxiety. Lancet Psychiatry. 2016.
- Marquand AF, et al. Conceptualizing mental disorders as deviations from normative functioning. Mol Psychiatry. 2019.
- Woo CW, et al. Building better biomarkers: brain models in translational neuroimaging. Nat Neurosci. 2017.
- Etkin A. A reckoning and research agenda for neuroimaging in psychiatry. Am J Psychiatry. 2019.
- Fonzo GA, et al. Brain regulation of emotional conflict predicts antidepressant response. Nat Hum Behav. 2019.
- Insel TR, Cuthbert BN. Brain disorders? Precisely (RDoC). Science. 2015.
- Gong Q, He Y. Depression, neuroimaging and connectomics: a selective overview. Biol Psychiatry. 2015.
- Hamilton JP, et al. Default-mode and task-positive network activity in major depressive disorder. Biol Psychiatry. 2011.
- Dunlop BW, Mayberg HS. Neuroimaging-based biomarkers for treatment selection in major depressive disorder. Dialogues Clin Neurosci. 2014.
Genetic and Pharmacogenomic Testing in Psychiatry
Diagnostics & Biomarkers series
The proposition
Genetics promised psychiatry two distinct things, and it is essential to keep them apart. The first was diagnostic and risk prediction: find the genes for depression, schizophrenia, or bipolar disorder, and you could screen, stratify, and understand mechanism. The second was pharmacogenomics: use a patient's genotype to choose or dose their medication. These have had opposite fates. The risk-prediction program produced a foundational, humbling reckoning with a generation of false leads; the pharmacogenomic program produced a small core of genuinely actionable findings buried in a much larger field of commercial overreach. This document treats them separately because conflating them is the most common error in how psychiatric genetics is discussed.
Risk prediction: the polygenic reality
Psychiatric disorders are highly heritable — twin estimates put major depression near 35–40% and schizophrenia and bipolar disorder considerably higher — but that heritability is spread across thousands of common variants of individually minuscule effect. Large genome-wide association studies (GWAS) have made this concrete: the Psychiatric Genomics Consortium analyses (Wray and colleagues, 2018) and subsequent work (Howard and colleagues, 2019, identifying 102 loci; later expansions) confirmed a highly polygenic architecture. The single-nucleotide-polymorphism heritability captured by these studies is modest (roughly 9% for depression), and polygenic risk scores built from them explain only a few percent of variance in liability.
The clinical implication is unambiguous and important: polygenic risk scores are not yet useful for individual prediction or diagnosis. They are powerful research tools and may eventually contribute to stratified models, but a score that explains 2–3% of variance cannot screen, diagnose, or guide an individual's care. Claims to the contrary outrun the data.
The candidate-gene reckoning
Before GWAS, psychiatric genetics ran for two decades on the candidate-gene paradigm: pick a biologically plausible gene — most famously the serotonin transporter promoter polymorphism (5-HTTLPR/SLC6A4), but also BDNF, COMT, MAOA, and others — and test it for association, often in gene-by-environment interaction models. The canonical result was Caspi and colleagues (2003), reporting that 5-HTTLPR moderated the effect of stressful life events on depression. It became one of the most cited findings in the field.
It did not hold. Large meta-analyses (Risch and colleagues, 2009) failed to confirm the 5-HTTLPR × stress interaction, and a definitive collaborative reanalysis (Culverhouse and colleagues, 2018) found no support. The decisive blow came from Border and colleagues (2019), who tested the 18 most-studied candidate genes for depression in samples far larger than the original studies and found that these "established" genes showed no more association with depression than randomly chosen variants. The historical candidate-gene literature, in other words, was largely a catalog of false positives generated by small samples, flexible analysis, and publication bias.
This reckoning is load-bearing across the entire library. It is why the Monoaminergic Dysfunction document treats the 5-HTTLPR story as cautionary, why the Genetics and Epigenetics etiology document leads with polygenicity rather than candidate genes, and the genetic counterpart to the imaging biotype failure described in the neuroimaging document. The unifying lesson — small samples plus high analytic flexibility manufacture findings that vanish on replication — recurs throughout this series.
Pharmacogenomics: where the signal is real
Pharmacogenomics is a different enterprise, and parts of it are genuinely actionable. The strongest evidence concerns pharmacokinetic genes — chiefly the cytochrome P450 enzymes CYP2D6 and CYP2C19, which metabolize many antidepressants and antipsychotics. Poor metabolizers accumulate drug and risk toxicity at standard doses; ultrarapid metabolizers underexpose and may appear "treatment-resistant" for purely kinetic reasons. The Clinical Pharmacogenetics Implementation Consortium (CPIC) and the Dutch Pharmacogenetics Working Group publish dosing guidance — for example, CYP2C19 phenotype and citalopram/escitalopram dosing, and CYP2D6 phenotype and tricyclic or venlafaxine dosing — and the FDA references several of these pairs in drug labeling. This connects directly to the metabolism caveats in the serotonergic pharmacology document.
A small number of pharmacodynamic variants are also strongly actionable, though more for safety than efficacy: HLA-B15 and HLA-A31
predict serious cutaneous reactions to carbamazepine, with clear screening recommendations in at-risk populations (relevant to the mood stabilizers document).The broader case for panel-based pharmacogenetic implementation was strengthened by the PREPARE trial (Swen and colleagues, 2023), which found that a 12-gene panel guiding prescribing across multiple specialties reduced clinically relevant adverse drug reactions — evidence that pharmacokinetic genotyping has system-level value, even if it speaks more to safety and tolerability than to choosing the right drug for a given depression.
Combinatorial panels: the overreach
The commercial psychiatric pharmacogenomic industry has run well ahead of this evidence. Combinatorial panels (such as GeneSight) bundle pharmacokinetic and pharmacodynamic genes into proprietary algorithms that output color-coded "use as directed / use with caution" recommendations. The pivotal GUIDED trial (Greden and colleagues, 2019) is the standard reference — and it is widely misread. Its primary endpoint, symptom improvement, was not statistically significant; only secondary endpoints (response and remission) reached significance, in an industry-sponsored trial whose blinding has been questioned. Independent reviewers (Zeier and colleagues, 2018; subsequent consensus statements) have been consistently cautious, and the FDA issued a safety communication warning that claims linking specific genotypes to specific drug responses for many such panels are not supported by adequate evidence. The actionable core — single-gene CYP guidance — is real; the proprietary algorithmic superstructure built on top of it is not validated to the standard its marketing implies.
The convergence
Genetics sits at the headwaters of this library's causal web. Polygenic liability is the most distal cause of the chronic stress susceptibility and downstream neuroplasticity vulnerability that the Capstone Synthesis brackets, while epigenetic mechanisms are how early-life adversity becomes biologically embedded. As a biomarker, genetics divides cleanly: risk scores converge with neuroimaging and inflammatory markers in the not-yet-clinical category, while pharmacokinetic genotyping is among the few biomarkers in this whole series with a defensible present-day clinical role.
Caveats — load-bearing, not decorative
For risk prediction: polygenic scores explain too little variance for individual use, derive overwhelmingly from European-ancestry samples and transfer poorly across populations, and predict liability rather than diagnosis. For pharmacogenomics: kinetic genotyping tells you about drug exposure, not whether a drug will treat a given patient's depression — it improves tolerability and dosing, not the fundamental problem of matching mechanism to illness. Combinatorial panels remain commercially oversold relative to their evidence. And across both domains, genotype is fixed while psychiatric illness is dynamic, contextual, and only partly biological — a hard ceiling on what any DNA-based test can contribute.
Bottom line
Keep the two programs separate. Genetic risk prediction in psychiatry has matured into honest polygenicity and, in the candidate-gene reckoning, produced one of the field's most important methodological lessons — but it offers no clinically usable diagnostic or risk biomarker today. Genetic pharmacogenomics contains a small, real, actionable core — CYP2D6/CYP2C19 dosing and a few HLA safety alleles — that is among the most defensible biomarkers in this series, surrounded by a commercial layer of combinatorial panels whose marketing outruns the GUIDED-trial evidence. The disciplined position is to use single-gene pharmacokinetic guidance where it applies, screen for the relevant safety alleles, and treat both polygenic risk scores and proprietary response-prediction panels as not-yet-ready for the decisions they claim to inform.
Selected references
- Wray NR, et al. (PGC). Genome-wide association analyses identify 44 risk variants for major depression. Nat Genet. 2018.
- Howard DM, et al. Genome-wide meta-analysis of depression identifies 102 independent variants. Nat Neurosci. 2019.
- Levey DF, et al. Bi-ancestral depression GWAS in the Million Veteran Program. Nat Neurosci. 2021.
- Border R, et al. No support for historical candidate gene or candidate gene-by-interaction hypotheses for major depression. Am J Psychiatry. 2019.
- Caspi A, et al. Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science. 2003.
- Risch N, et al. Interaction between the serotonin transporter gene (5-HTTLPR), stressful life events, and risk of depression: a meta-analysis. JAMA. 2009.
- Culverhouse RC, et al. Collaborative meta-analysis finds no evidence of a strong interaction between stress and 5-HTTLPR genotype contributing to depression. Mol Psychiatry. 2018.
- Hicks JK, et al. (CPIC). Guideline for CYP2D6 and CYP2C19 genotypes and dosing of selective serotonin reuptake inhibitors. Clin Pharmacol Ther. 2015.
- Hicks JK, et al. (CPIC). Guideline for CYP2D6 and CYP2C19 genotypes and dosing of tricyclic antidepressants. Clin Pharmacol Ther. 2017.
- Bousman CA, et al. Clinical pharmacogenetics implementation: a review of antidepressant and antipsychotic dosing guidelines. Pharmacogenomics. 2021.
- Greden JF, et al. Combinatorial pharmacogenomics for depression: the GUIDED randomized clinical trial. J Psychiatr Res. 2019.
- Zeier Z, et al. Clinical implementation of pharmacogenetic decision support tools for antidepressant drug prescribing. Am J Psychiatry. 2018.
- Swen JJ, et al. (PREPARE). A 12-gene pharmacogenetic panel to prevent adverse drug reactions: an open-label, multicentre, controlled trial. Lancet. 2023.
- US FDA. Safety communication: genetic tests claiming to predict patient response to specific medications. 2018–2019.
- Fabbri C, et al. Genetics of treatment outcomes in major depressive disorder. Neurosci Biobehav Rev. 2020.
- Sullivan PF, Neale MC, Kendler KS. Genetic epidemiology of major depression: review and meta-analysis. Am J Psychiatry. 2000.
- Sullivan PF, et al. Psychiatric genomics: an update and an agenda. Am J Psychiatry. 2018.
- Martin AR, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet. 2019.
- Bousman CA, et al. Review and consensus on pharmacogenomic testing in psychiatry. Pharmacopsychiatry. 2021.
- McMahon FJ, Insel TR. Pharmacogenomics and personalized medicine in neuropsychiatry. Neuron. 2012.
Electrophysiological Biomarkers in Psychiatry
Diagnostics & Biomarkers series
The proposition
Electroencephalography has a structural advantage no other biomarker modality can match: it is inexpensive, portable, requires no radioligand or magnet, and reads neural activity at the millisecond timescale on which the brain actually computes. If any biomarker class is positioned to be deployed at scale — in community clinics, even at home — it is this one. That promise is exactly why the gap between EEG's scalability and its validation matters so much, and why the modality has accumulated both genuinely interesting predictive signals and a conspicuous record of commercial overreach.
The evidence: resting quantitative EEG
The oldest qEEG proposal is frontal alpha asymmetry — relatively greater right-than-left frontal alpha power, interpreted within Davidson's approach–withdrawal framework as a marker of depressive affective style. It has intuitive appeal and a large literature, but its test–retest reliability and its discriminative validity at the individual level are weak (Tenke and Kayser), and it has not matured into a usable diagnostic.
More clinically oriented is prefrontal theta cordance, where an early decrease (within the first one to two weeks of treatment) predicts later antidepressant response (Leuchter, Cook, Bares, and colleagues). The logic is attractive: a week-one signal that forecasts a week-six outcome would let clinicians abandon non-responders early. The effect is real in aggregate but modest, the metric is implemented inconsistently across groups, and it has not been prospectively validated to the standard required for routine decision-making.
The signal worth watching: rostral ACC theta
The single most credible electrophysiological predictor of antidepressant response is rostral anterior cingulate theta activity, source-localized from scalp EEG (Pizzagalli and colleagues). Higher pretreatment rACC theta predicts better outcomes across multiple antidepressant classes. Its importance is amplified by convergence across modalities: the rACC also emerges as a response predictor in PET and fMRI, and the EMBARC trial used EEG-derived rACC signal alongside reward-circuit imaging in a prospective prediction design. When the same anatomical signal surfaces in independent measurement methods, the underlying finding is more likely to be real — this is the methodological mirror image of the single-modality biotype failures described in the neuroimaging document.
Event-related potentials
Stimulus-locked potentials offer more mechanistically specific markers. The loudness dependence of the auditory evoked potential (LDAEP) is a proposed index of central serotonergic tone: a strong LDAEP is thought to reflect low serotonergic activity and has been associated with better response to serotonergic antidepressants (Juckel and colleagues), connecting electrophysiology to the serotonergic pharmacology account. The error-related negativity (ERN) is a robust, heritable transdiagnostic marker of internalizing pathology and anxiety (Hajcak and colleagues), and the reward positivity / feedback-related negativity indexes reward processing and tracks anhedonia — a direct electrophysiological link to the dopaminergic reward and anhedonia account. These are valuable research markers; none is a validated clinical test.
TMS-EEG: target engagement, not diagnosis
Combining transcranial magnetic stimulation with simultaneous EEG (TMS-EEG) probes cortical excitability and effective connectivity directly through TMS-evoked potentials, yielding indices that bear on cortical inhibition/excitation balance — relevant to the glutamatergic and GABAergic accounts. Its most promising near-term use is not diagnosis but target engagement and protocol optimization in therapeutic stimulation, a thread the planned Interventional and Neurostimulation series will develop. It remains a research instrument.
The cautionary tale: EEG-guided prescribing
As with combinatorial pharmacogenomics, the commercial layer has run ahead of the evidence. Products promising to match patients to specific medications from their resting EEG (referenceing-EEG / qEEG-guided prescribing) have been marketed for years, but the supporting evidence is weak and the independent replication poor — a direct parallel to the GUIDED-trial overreach described in the genetic and pharmacogenomic document. The decisive appraisal came from Widge and colleagues (2019), whose meta-analysis of EEG biomarkers for antidepressant response concluded that the existing evidence does not support clinical use: effect sizes were inflated by small studies, and no marker met the bar for guiding individual treatment. This is the electrophysiological counterpart to the imaging and genetic reckonings, and it is load-bearing — the EEG predictive literature, like its neighbors, has been distorted by small samples and analytic flexibility.
The convergence
Electrophysiology threads through this library at the level of circuit dynamics. The rACC theta signal converges with neuroimaging on the same response-predictive anatomy; the LDAEP ties to serotonergic function and the reward positivity to dopaminergic anhedonia; TMS-EEG measures of cortical excitability bear on the glutamatergic account; and sleep-EEG abnormalities link to the sleep and circadian literatures. As a biomarker class, its closest methodological sibling is digital phenotyping — both are cheap, scalable, and limited by the same overfitting trap — with the synthesis in the series capstone.
Caveats — load-bearing, not decorative
EEG's strengths and weaknesses are two sides of the same coin. Its low spatial resolution means source localization (as in rACC theta) involves modeling assumptions that introduce uncertainty. It is artifact-prone — muscle, eye movement, and electrode quality degrade signal, particularly outside research settings. Test–retest reliability is a recurring concern for several proposed markers, undermining their use as individual-level indices. The predictive literature suffers the familiar small-sample inflation that the Widge meta-analysis exposed. And the proprietary, non-transparent algorithms behind several commercial offerings make independent validation impossible, which is itself disqualifying for a clinical test.
The constructive counterpoint: EEG's scalability is genuine, and if a marker like rACC theta survives prospective validation in large, multi-site samples, EEG could become the first biomarker class to reach community-scale deployment — precisely because it does not depend on the expensive infrastructure that confines imaging and PET to specialist centers.
Bottom line
Electrophysiology is the most deployable biomarker modality in psychiatry and contains its most credible cross-validated predictor of antidepressant response in rostral ACC theta, whose appearance across EEG, PET, and fMRI is genuinely reassuring. It also contains, in EEG-guided prescribing, a clear instance of commercial claims outrunning evidence, and the Widge meta-analysis is the disciplined reader's reference point for why no EEG marker is yet ready for individual treatment decisions. The realistic near-term value lies in early-response prediction signals like rACC theta and in TMS-EEG target engagement for stimulation therapies — promising research programs, not clinic-ready tests, and a class whose deployability makes prospective validation especially worth pursuing.
Selected references
- Pizzagalli DA, et al. Pretreatment rostral anterior cingulate cortex theta activity in relation to symptom improvement in depression (EMBARC). JAMA Psychiatry. 2018.
- Pizzagalli DA. Frontocingulate dysfunction in depression: toward biomarkers of treatment response. Neuropsychopharmacology. 2011.
- Widge AS, et al. Electroencephalographic biomarkers for treatment response prediction in major depressive illness: a meta-analysis. Am J Psychiatry. 2019.
- Leuchter AF, Cook IA, et al. Changes in brain function during administration of venlafaxine or placebo (theta cordance). Psychiatry Res. 2009.
- Bares M, et al. Early reduction in prefrontal theta cordance and treatment outcome in depression. J Psychiatr Res. 2007.
- Tenke CE, Kayser J, et al. Frontal EEG alpha asymmetry and treatment response in depression. Biol Psychol. 2011.
- Davidson RJ. Anterior cerebral asymmetry and the nature of emotion. Brain Cogn. 1992.
- Juckel G, et al. Loudness dependence of the auditory evoked N1/P2 component as an indicator of serotonergic function. Neuropsychopharmacology. 2004.
- Hajcak G, et al. The error-related negativity as a transdiagnostic marker of internalizing psychopathology. Curr Dir Psychol Sci. 2017.
- Proudfit GH. The reward positivity: from basic research on reward to a biomarker for depression. Psychophysiology. 2015.
- Olbrich S, Arns M. EEG biomarkers in major depressive disorder: discriminative power and prediction of treatment response. Int Rev Psychiatry. 2013.
- Arns M, et al. EEG-based personalized medicine in psychiatry (iSPOT-D). Clin EEG Neurosci. 2016.
- Bruder GE, et al. Electroencephalographic and perceptual asymmetry differences between responders and nonresponders to antidepressants. Biol Psychiatry. 2001.
- Iosifescu DV. Electroencephalography-derived biomarkers of antidepressant response. Harv Rev Psychiatry. 2011.
- Ilmoniemi RJ, Kicić D. Methodology for combined TMS and EEG. Brain Topogr. 2010.
- Tremblay S, et al. Clinical utility and prospective of TMS-EEG. Clin Neurophysiol. 2019.
- Mulert C, et al. Rostral anterior cingulate cortex activity and prediction of treatment response. Int J Neuropsychopharmacol. 2007.
- Kupfer DJ. REM latency: a psychobiologic marker for primary depressive disease. Biol Psychiatry. 1976.
- Wade EC, Iosifescu DV. Using clinical and electrophysiologic measures to predict treatment outcomes in depression. Curr Psychiatry Rep. 2016.
- Cook IA, et al. Quantitative EEG biomarkers in the prediction of treatment response. J Affect Disord. 2013.
Neuroendocrine and Metabolic Biomarkers in Psychiatry
Diagnostics & Biomarkers series
The proposition
This document covers the biomarkers with the longest history in psychiatry and, in one case, the field's original cautionary tale. Neuroendocrine and metabolic measures are attractive for the same reasons inflammatory markers are: they are inexpensive, widely available in general medicine, and mechanistically tied to the HPA axis, metabolic, and hormonal accounts of depression. They are also the clearest place where two truths coexist: some of these markers identify reversible contributors that every clinician should check, while others — most famously the dexamethasone suppression test — taught psychiatry decades ago how a promising biomarker fails the test of specificity.
The original cautionary tale: the dexamethasone suppression test
The dexamethasone suppression test (DST) was, in the 1980s, the most studied biological test in psychiatry and the great hope for a laboratory diagnosis of melancholia. The logic followed directly from HPA-axis dysregulation: administer dexamethasone, a synthetic glucocorticoid that should suppress endogenous cortisol via negative feedback, and measure whether suppression fails. Roughly 40–50% of patients with severe or melancholic depression are non-suppressors, and Carroll and colleagues (1981) proposed the DST as a diagnostic marker.
The American Psychiatric Association task force review (1987) ended that hope: the DST's sensitivity was too low (many depressed patients suppress normally) and its specificity insufficient (non-suppression occurs in other psychiatric and medical conditions, and varies with weight, age, and intercurrent illness) to serve as a diagnostic test. The DST is the prototype for every subsequent biomarker disappointment in this series — the inflammatory, imaging, genetic, and EEG reckonings are all, in a sense, re-learnings of the DST lesson: a robust group-level association does not make an individual-level diagnostic test.
Where the DST retains value is more circumscribed and more honest. Persistent non-suppression after treatment predicts higher relapse risk; non-suppression is more common in psychotic depression; and a body of work (Coryell; Mann and colleagues) links HPA hyperactivity to suicide risk. The combined dexamethasone/CRH test is more sensitive than the DST alone and remains a valuable research probe of corticosteroid-receptor function (Holsboer). These are legitimate uses — prognosis and mechanism — that do not require the marker to be diagnostic.
HPA-axis markers beyond the DST
The broader HPA literature offers several measures: basal cortisol (the meta-analytic signal is modest and heterogeneous; Stetler and Miller, 2011), the cortisol awakening response, and hair cortisol as an index of chronic exposure (Staufenbiel and colleagues). These connect biomarker assessment directly to the chronic stress and allostatic load account that this library treats as its upstream convergence point — hair cortisol, in particular, is the closest thing to a cumulative-stress biomarker. As with the DST, the limitation is specificity and state dependence rather than absence of signal.
Thyroid: the genuinely actionable axis
Thyroid assessment is where neuroendocrine biomarkers earn their keep clinically. Overt and subclinical hypothyroidism can present as, exacerbate, or sustain depression; thyroid dysfunction is a recognized contributor to apparent treatment resistance; and thyroid status bears on the course of bipolar disorder, including rapid cycling — a thread connected to the lithium and mood-stabilizer literature and to clinical frameworks emphasizing the medical substrate of mood disorders. TSH (with reflexive free T4) is a standard, defensible part of the workup precisely because it identifies a reversible cause. This is the model for what a useful biomarker looks like: not a diagnosis of depression, but the detection of a treatable contributor.
Metabolic markers
The metabolic biomarkers — fasting glucose, insulin and HOMA-IR, lipids, leptin, and adiponectin — index the bidirectional relationship between depression and metabolic dysfunction detailed in the metabolic etiology document. Their biomarker significance is twofold. First, they help define the proposed immunometabolic subtype of depression (Milaneschi, Penninx, and colleagues): patients with atypical, neurovegetative features — increased appetite and weight, hypersomnia, fatigue — show a distinct metabolic-inflammatory profile that overlaps substantially with the high-CRP group described in the inflammatory and immune document. Second, metabolic markers are themselves the chief confounder of inflammatory markers, since adiposity drives CRP — a reminder that these "separate" biomarker classes are measuring overlapping biology.
Sex steroids and neuroactive steroids
Reproductive and neuroactive steroids close the neuroendocrine survey and link to the hormonal account. Estrogen and progesterone fluctuations underlie premenstrual, peripartum, and perimenopausal mood vulnerability; the discovery that the allopregnanolone pathway is therapeutically tractable produced brexanolone and zuranolone, connecting this document to the GABAergic pharmacology account. As biomarkers, sex steroids are more useful for identifying hormonally-timed depressive subtypes and reversible endocrine contributors than for diagnosis. Nutritional-endocrine markers — vitamin D, B12, folate — similarly function as detectors of reversible contributors (see nutritional factors) rather than as diagnostic tests.
The convergence
Neuroendocrine and metabolic markers sit at the center of this library's causal web. The HPA measures are the direct readout of the chronic stress upstream node; the metabolic markers overlap with inflammation and define the immunometabolic subtype; the thyroid and nutritional markers connect to reversible-contributor workup; and HPA-driven cortisol exposure feeds the hippocampal atrophy and impaired neurogenesis of the neuroplasticity/BDNF downstream hub. The synthesis appears in the series capstone.
Caveats — load-bearing, not decorative
The DST history is the master caveat: a marker can be strongly associated with severe depression at the group level and still fail as a diagnostic test because of inadequate sensitivity and specificity. State dependence affects nearly all of these measures — cortisol, metabolic, and thyroid markers shift with acute illness, stress, weight change, and time of day, complicating their use as stable trait indicators. Confounding is pervasive and bidirectional: adiposity drives both metabolic and inflammatory markers; antidepressants and antipsychotics themselves induce metabolic change. And the reversible-contributor versus diagnostic distinction must be kept sharp — checking TSH and B12 because they identify treatable causes is sound medicine; interpreting any of these markers as confirming or excluding a depressive diagnosis is not supported.
Bottom line
Neuroendocrine and metabolic biomarkers occupy two honest roles. As detectors of reversible contributors — thyroid dysfunction, B12 and vitamin D deficiency — they are a defensible, guideline-consistent part of clinical assessment, and the clearest example in this whole series of biomarkers with present-day clinical value. As diagnostic or stratifying markers for depression itself, they repeat the DST's lesson: real group-level signal, insufficient individual-level specificity. The most promising research direction is the immunometabolic subtype, where metabolic, inflammatory, and atypical-symptom profiles converge on a biologically coherent group — but that remains a stratification hypothesis, not a clinical test. Use these markers to find treatable contributors and to probe mechanism; do not ask them to diagnose.
Selected references
- Carroll BJ, et al. A specific laboratory test for the diagnosis of melancholia: the dexamethasone suppression test. Arch Gen Psychiatry. 1981.
- APA Task Force on Laboratory Tests in Psychiatry. The dexamethasone suppression test: an overview of its current status in psychiatry. Am J Psychiatry. 1987.
- Nelson JC, Davis JM. DST studies in psychotic depression: a meta-analysis. Am J Psychiatry. 1997.
- Holsboer F. The corticosteroid receptor hypothesis of depression. Neuropsychopharmacology. 2000.
- Pariante CM, Lightman SL. The HPA axis in major depression: classical theories and new developments. Trends Neurosci. 2008.
- Stetler C, Miller GE. Depression and hypothalamic-pituitary-adrenal activation: a quantitative summary of four decades of research. Psychosom Med. 2011.
- Staufenbiel SM, et al. Hair cortisol, stress exposure, and mental health: a systematic review. Psychoneuroendocrinology. 2013.
- Coryell W, Schlesser M. The dexamethasone suppression test and suicide prediction. Am J Psychiatry. 2001.
- Mann JJ, Currier D. Stress, genetics and epigenetic effects on the neurobiology of suicidal behavior and depression. Eur Psychiatry. 2010.
- Vreeburg SA, et al. Major depressive disorder and hypothalamic-pituitary-adrenal axis activity. Arch Gen Psychiatry. 2009.
- Penninx BWJH, et al. Understanding the somatic consequences of depression: biological mechanisms and the role of depression symptom profile. BMC Med. 2013.
- Milaneschi Y, et al. Depression heterogeneity and its biological underpinnings: immunometabolic depression. Biol Psychiatry. 2020.
- Lamers F, et al. Metabolic and inflammatory markers: associations with individual depressive symptoms. Psychol Med. 2018.
- Bauer M, Whybrow PC. Thyroid hormone, neural tissue and mood modulation. World J Biol Psychiatry. 2001.
- Joffe RT. Hormone treatment of depression. Dialogues Clin Neurosci. 2011.
- Schatzberg AF, et al. HPA axis function in psychotic major depression. Mol Psychiatry. 2014.
- Schüle C, Nothdurfter C, Rupprecht R. The role of allopregnanolone in depression and anxiety. Prog Neurobiol. 2014.
- Belmaker RH, Agam G. Major depressive disorder. N Engl J Med. 2008.
- Anglin RE, et al. Vitamin D deficiency and depression in adults: systematic review and meta-analysis. Br J Psychiatry. 2013.
- Mocking RJT, et al. Biological profiling of prospective antidepressant response. Psychoneuroendocrinology. 2017.
Digital Phenotyping and Computational Biomarkers in Psychiatry
Diagnostics & Biomarkers series
The proposition
Every biomarker discussed so far asks a patient to come to a laboratory and yields a snapshot — a blood draw, a scan, a single morning's cortisol. Digital phenotyping inverts this. Coined by Onnela and Rauch, the term denotes the moment-by-moment, in-situ quantification of behavior using the devices people already carry. Its premise is that the smartphone and wearable continuously record the very things psychiatry struggles to measure — activity, sleep, mobility, social contact, speech, even typing — and that depression and its relatives leave fingerprints in these data. If realized, this would be the most ecologically valid biomarker class in psychiatry, capturing real-world function rather than a clinic-room approximation of it. It also raises the most serious ethical problems in this entire series, and those problems are not peripheral to the science but central to whether it can be deployed at all.
The evidence: passive sensing
The richest passive signal is movement and rhythm. Actigraphy and accelerometry capture reduced overall activity and, more informatively, disrupted circadian rest–activity rhythms — a behavioral readout of the circadian and sleep accounts. GPS-derived mobility metrics track depressive symptoms: in early work (Saeb and colleagues, 2015), reduced mobility, fewer location changes, and more time spent at home correlated with greater symptom severity — a quantified version of the social withdrawal and psychomotor slowing clinicians recognize, linking to the social/environmental and dopaminergic anhedonia accounts.
The strongest prospective application is relapse prediction in bipolar disorder. Smartphone-based monitoring of activity, sleep, and communication patterns can detect the prodrome of mood episodes (Faurholt-Jepsen and colleagues), and changes in keystroke dynamics track mood states. Because bipolar relapse has a behavioral signature that often precedes subjective awareness, passive sensing is genuinely promising here — arguably the use case with the clearest near-term clinical logic.
The evidence: speech, language, and active sampling
Speech is a dense behavioral biomarker. Acoustic features — reduced pitch variability, slowed rate, increased pause duration — index the psychomotor and prosodic changes of depression, and automated analysis can detect depressive states with moderate accuracy (Cummins, Low, and colleagues). Natural-language analysis adds a content channel: increased first-person singular pronoun use and, notably, elevated use of absolutist words (Al-Mosaiwi and Johnstone, 2018) characterize the language of depression and anxiety. Active sampling — ecological momentary assessment, brief in-the-moment self-reports — complements passive data and remains the validation standard against which passive signals are calibrated.
Use cases and the deployability argument
Digital phenotyping's appeal is its potential reach. It uses existing consumer hardware, generates continuous longitudinal data rather than cross-sectional snapshots, and measures real-world function — the outcome that ultimately matters. Its plausible near-term roles are continuous measurement-based care, early relapse detection (especially in bipolar disorder), and the triggering of just-in-time interventions. This is also the biomarker class most directly relevant to consumer and digital mental health tools, where behavioral signals from a phone are the native data substrate — which raises the validation and ethical stakes precisely because deployment can outpace evidence in a lightly regulated commercial space.
The convergence
Digital phenotyping operationalizes several of this library's mechanisms as observable behavior: rest–activity disruption is the behavioral face of the circadian and sleep accounts; reduced mobility and social contact externalize the social/environmental account; psychomotor and reward-related changes connect to dopaminergic anhedonia. As a biomarker class, its closest methodological sibling is electrophysiology — both cheap, scalable, and vulnerable to the same overfitting trap — and it shares the reproducibility problem quantified for neuroimaging. The synthesis is in the series capstone.
Caveats — load-bearing, not decorative
Privacy and ethics are the dominant caveat, not a footnote. Passive sensing means continuous collection of location, communication metadata, and behavior — among the most sensitive data a person generates. This raises hard problems of meaningful consent (continuous monitoring strains the notion of informed agreement), data security, the risk of surveillance and commercial exploitation, and the asymmetry of power when a clinician or company can infer mood from a phone. These are not solved problems, and any deployment that treats them as secondary is ethically and, increasingly, legally untenable.
The scientific caveats mirror the rest of the series. Reproducibility and generalization are weak: models trained on one sample, device, or context transfer poorly, and the small-sample overfitting that plagues imaging and EEG applies with equal force to high-dimensional behavioral data. Specificity is low: behavior is multiply determined, and a drop in mobility may reflect weather, physical illness, a holiday, or unemployment rather than mood. Missingness and engagement degrade the signal — active sampling suffers attrition, and passive streams have gaps. And the digital divide — unequal access to devices and connectivity — means a biomarker built on consumer hardware risks working best for those who need it least. Regulatory status remains mostly that of wellness or research tools rather than validated diagnostics.
Bottom line
Digital phenotyping is the most ecologically valid and potentially the most scalable biomarker class in psychiatry, and in bipolar relapse prediction it has the clearest near-term clinical logic of anything in this series. It is also where the science and the ethics are most tightly entangled: the same continuous behavioral data that make it powerful make it uniquely hazardous, and privacy, consent, and equity are first-order scientific constraints, not compliance afterthoughts. The disciplined position is to treat current digital phenotyping as a promising research and measurement-support tool — valuable for tracking change and flagging relapse in monitored, consented settings — while withholding the confidence that its commercial framing often implies, and while insisting that validation and ethical infrastructure advance together rather than letting deployment run ahead of both.
Selected references
- Onnela JP, Rauch SL. Harnessing smartphone-based digital phenotyping to enhance behavioral and mental health. Neuropsychopharmacology. 2016.
- Torous J, et al. New tools for new research in psychiatry: a scalable and customizable platform for digital phenotyping. JMIR Ment Health. 2016.
- Insel TR. Digital phenotyping: technology for a new science of behavior. JAMA. 2017.
- Saeb S, et al. Mobile phone sensor correlates of depressive symptom severity in daily-life behavior. J Med Internet Res. 2015.
- Faurholt-Jepsen M, et al. Behavioral activities collected through smartphones and the association with illness activity in bipolar disorder. Int J Methods Psychiatr Res. 2016.
- Mohr DC, Zhang M, Schueller SM. Personal sensing: understanding mental health using ubiquitous sensors and machine learning. Annu Rev Clin Psychol. 2017.
- De Choudhury M, et al. Predicting depression via social media. Proc ICWSM. 2013.
- Al-Mosaiwi M, Johnstone T. In an absolute state: elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clin Psychol Sci. 2018.
- Cummins N, et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 2015.
- Low DM, et al. Automated assessment of psychiatric disorders using speech: a systematic review. Laryngoscope Investig Otolaryngol. 2020.
- Ben-Zeev D, et al. CrossCheck: integrating self-report, behavioral sensing, and smartphone use to identify relapse in schizophrenia. Psychiatr Rehabil J. 2017.
- Wang R, et al. StudentLife: assessing mental health, academic performance and behavioral trends using smartphones. Proc UbiComp. 2014.
- Barnett I, et al. Relapse prediction in schizophrenia through digital phenotyping. Neuropsychopharmacology. 2018.
- Huckvale K, Venkatesh S, Christensen H. Toward clinical digital phenotyping: a timely opportunity to consider purpose, quality, and safety. NPJ Digit Med. 2019.
- Martinez-Martin N, et al. Data mining for health: staking out the ethical territory of digital phenotyping. NPJ Digit Med. 2018.
- Birk RH, Samuel G. Can digital data diagnose mental health problems? A sociological exploration of 'digital phenotyping'. Sociol Health Illn. 2020.
- Jacobson NC, et al. Digital biomarkers of mood disorders and symptom change. NPJ Digit Med. 2019.
- Torous J, et al. Smartphones, sensors, and machine learning to advance real-time prediction and interventions for suicide prevention. Curr Psychiatry Rep. 2018.
- Place S, et al. Behavioral indicators on a mobile sensing platform predict clinically validated psychiatric symptoms. J Med Internet Res. 2017.
- Bzdok D, Meyer-Lindenberg A. Machine learning for precision psychiatry: opportunities and challenges. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018.
This article is for education only and is not medical advice, diagnosis, or treatment. Always talk with a qualified professional about your situation.
Related articles
Depression
Major depressive disorder is a common, recurrent mood syndrome of persistent low mood and lost pleasure, diagnosed clinically rather than by any test. Getting the differential right — especially versus bipolar disorder — is central to safe treatment.
DiagnosisBipolar Disorder
Bipolar disorder is a highly heritable mood disorder defined by episodes of mania or hypomania that usually alternate with depression. Its central diagnostic danger is being mistaken for ordinary depression.
DiagnosisCFS & Long COVID
ME/CFS and Long COVID are real, disabling post-infection illnesses sharing a core of fatigue, brain fog, and post-exertional malaise. Distinguishing them from depression is consequential, because "push through it" advice can cause lasting harm.