Modeling Disease Progression with Spatial Transcriptomics

Disease progression is fundamentally a temporal process — but most omics data is static.

Spatial transcriptomics offers a way around that. In heterogeneous tissues, different regions often represent different stages of pathology. By profiling gene expression across these regions and ordering them along a disease axis, you can reconstruct a pseudo-temporal trajectory from a single tissue section — without ever needing longitudinal samples.

This can be thought of as spatial pseudotime modeling, where spatial regions act as proxies for disease progression. Tools like SpaceFlow (Ren et al., 2022) — a Python package built on deep graph neural networks — have formalized this concept, embedding spatial transcriptomic data into low-dimensional representations that capture both expression similarity and spatial proximity to derive pseudo-spatiotemporal maps. While spatial position is not a direct measure of time, in heterogeneous tissues it often reflects underlying progression states — and that’s enough to reconstruct the trajectory.

The Concept: Spatial Regions as Temporal Proxies

In many diseases, a single tissue section contains regions at different stages of pathology. Consider a kidney with progressive fibrosis: some regions appear normal, others show early signs of injury, and others are fully fibrotic. By profiling gene expression in each of these regions and ordering them along a disease axis, you can reconstruct a pseudo-temporal trajectory — without ever needing longitudinal samples from the same patient.

This approach is particularly powerful for diseases where:

Tissue is heterogeneous: Different regions are at different disease stages (e.g., cystic kidney disease, tumor margins, fibrotic liver)
Longitudinal sampling is impractical: Serial biopsies are invasive or unethical
Spatial context matters: The relationship between diseased and adjacent tissue is biologically informative

Building a Disease Progression Model

Step 1: Define Disease Stages

The first step is classifying each ROI (region of interest) into a disease stage. Importantly, stage definitions should ideally be independent of the expression data used for downstream analysis to avoid circularity. Classification can be based on:

Histological assessment: Pathologist-defined grades (normal, early, moderate, severe)
Molecular markers: Expression of known disease-associated genes
Morphological features: Cyst size, fibrosis extent, immune infiltrate density

In our work, we typically define 3-5 stages to balance resolution with statistical power. Too few stages obscure the trajectory; too many create underpowered groups.

Step 2: Identify Trajectory-Associated Genes

With stages defined, you can identify genes whose expression changes systematically along the trajectory. We use several complementary approaches:

Monotonic trends: Spearman correlation between gene expression and disease stage identifies genes that consistently increase or decrease. A threshold around |r| > 0.4 with FDR correction often provides a reasonable balance of sensitivity and specificity, though the optimal threshold depends on your dataset and sample size.

Non-monotonic patterns: Not all disease-relevant genes change linearly. Some peak during transitional stages (compensatory responses that fail as disease advances) or valley (metabolic processes that are temporarily suppressed then reactivated). Pattern detection algorithms that classify genes into “peak” and “valley” categories can capture these dynamics.

ANOVA across stages: For genes where the trajectory isn’t monotonic, one-way ANOVA identifies genes with any significant expression difference across stages, regardless of pattern.

Step 3: Classify Gene Patterns

Categorizing trajectory-associated genes into pattern classes reveals different biological programs:

Pattern	Description	Biological Interpretation
Monotonic Up	Steadily increases with disease	Fibrosis, inflammation, stress response
Monotonic Down	Steadily decreases with disease	Loss of differentiated function, metabolic decline
Peak	Rises then falls	Compensatory mechanisms that eventually fail
Valley	Falls then rises	Transient metabolic reprogramming

These patterns reflect different biological programs operating at distinct phases of disease. The pattern classes become the input for downstream pathway enrichment analysis, revealing which processes dominate each phase.

Step 4: Pathway Enrichment by Pattern

Running Gene Ontology (Ashburner et al., 2000) or KEGG (Kanehisa & Goto, 2000) pathway enrichment separately on each pattern class yields biological narratives for each phase:

Monotonic Up genes might enrich for extracellular matrix organization, inflammatory signaling, and TGF-beta pathway
Monotonic Down genes might enrich for solute transport, mitochondrial metabolism, and cell polarity
Peak genes might enrich for DNA repair, cell cycle, and autophagy — processes that are transiently activated before the cell’s capacity is overwhelmed
Valley genes might enrich for lipid metabolism or oxidative phosphorylation — metabolic programs temporarily shut down during acute injury

Integrating Cell Type Deconvolution

Disease progression analysis becomes even more powerful when combined with spatial deconvolution. By tracking cell type proportions across disease stages, you can ask:

When does immune infiltration begin relative to tissue damage?
Do fibroblast proportions increase before or after epithelial cell loss?
Are specific immune cell populations (e.g., macrophage subtypes) enriched at particular disease stages?

Gene-level trajectories tell you what is changing. Cell-type trajectories tell you who is driving those changes. Together, they transform the trajectory from a gene list into a cellular narrative.

Cross-Species Validation

For studies using animal models, a critical question is: do the disease trajectory genes in my mouse model also change in human disease?

Ortholog mapping — identifying one-to-one gene equivalents between species — enables direct comparison. In our experience, cross-species concordance is often modest (50-70% of trajectory genes have detectable orthologs, and fewer show concordant direction of change). But the genes that are concordant represent the most robust and translatable targets.

We’ve found that transcription factor targets, rather than individual genes, often show better cross-species conservation. Identifying master regulators (e.g., using tools like ChEA3 (Keenan et al., 2019)) and then checking whether their target gene sets are concordant across species can reveal conserved regulatory programs that individual gene comparisons would miss.

It’s worth noting that differences in tissue architecture and sampling between species can also contribute to apparent discordance — not all disagreement reflects true biological divergence.

Practical Considerations

Sample size: Disease progression analysis requires sufficient ROIs at each stage. We recommend a minimum of 5-8 ROIs per stage for robust statistical testing, though more is always better.

Stage assignment bias: If stage classification is based on the same gene expression data used for trajectory analysis, there’s a risk of circularity. When possible, base stage classification on independent data (histology, orthogonal markers).

Transitional regions: ROIs that fall at the boundary between stages can be the most informative — or the most confusing. Consider analyzing boundary ROIs separately to understand the transition mechanisms.

Multiple trajectories: Not all diseases progress along a single axis. Some conditions have branching trajectories (e.g., different fibrosis patterns, varying immune responses). PCA or UMAP visualization of your ROIs can reveal whether a single trajectory model is appropriate.

Core assumption: This entire approach assumes a coherent progression axis exists in your tissue. That may not hold in diseases with branching or parallel pathological processes. Always verify that your spatial ordering reflects a biologically meaningful continuum before building trajectory models on top of it.

The Promise of Spatial Trajectories

As spatial transcriptomics platforms increase in resolution and throughput, disease progression modeling will become increasingly central to understanding pathology. The ability to reconstruct temporal dynamics from spatial patterns — turning a single tissue section into a movie of disease — is one of the most exciting applications of this technology.

This approach transforms spatial data from static measurements into dynamic models of disease biology. At Cytogence, we’ve developed robust workflows for spatial disease progression analysis that integrate trajectory modeling, cell type deconvolution, pathway enrichment, and cross-species validation. If you’re studying a progressive disease with spatially heterogeneous tissue, this approach can transform your GeoMx data from static snapshots into dynamic biological narratives.

References

Ren H, Walker BL, Cang Z, Nie Q. Identifying multicellular spatiotemporal organization of cells with SpaceFlow. Nature Communications. 2022;13:4076. doi: 10.1038/s41467-022-31739-w. PMID: 35835774.
Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25-29. doi: 10.1038/75556. PMID: 10802651.
Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research. 2000;28(1):27-30. doi: 10.1093/nar/28.1.27. PMID: 10592173.
Keenan AB, Torre D, Lachmann A, et al. ChEA3: transcription factor enrichment analysis by orthogonal omics integration. Nucleic Acids Research. 2019;47(W1):W212-W224. doi: 10.1093/nar/gkz446. PMID: 31114921.

Cytogence specializes in spatial transcriptomics analysis and computational biology consulting. Reach out to explore how we can support your research.