Differential Expression Analysis with DESeq2: Practical Lessons from Real-World RNA-seq Projects

DESeq2 (Love et al., 2014) is the workhorse of RNA-seq analysis — but most analyses using it are only superficially correct.

The difference between a competent DESeq2 analysis and a rigorous one lies in the details. Small decisions in design, normalization, and filtering can dramatically change which genes you call significant. Here are practical lessons we’ve learned from running DESeq2 across dozens of research projects.

Getting the Design Formula Right

The design formula defines the statistical question you’re asking — and it’s where most analyses go wrong. DESeq2’s ~ condition is the simplest case, but real experiments are rarely that simple.

Multi-Factor Designs

If your experiment includes both a treatment variable and a batch variable, your design should be ~ batch + treatment. The order matters: variables listed first are “controlled for,” and the last variable is the one tested for differential expression.

Common mistakes:

Omitting batch effects: If samples were processed on different days, by different operators, or on different flow cells, ignoring batch is a source of false positives and false negatives
Including too many covariates: With small sample sizes (n < 5 per group), adding multiple covariates consumes degrees of freedom and reduces power. Only include covariates with demonstrated effects.
Nesting errors: If treatment is perfectly confounded with batch (all treated samples in batch 1, all controls in batch 2), no statistical method can separate the effects. This is an experimental design problem, not an analysis problem.

Interaction Terms

For experiments asking “does the treatment effect differ between genotypes?” you need an interaction term: ~ genotype + treatment + genotype:treatment. The interaction coefficient captures the difference in treatment effect between genotypes — a subtle but important distinction from the main effects alone.

Normalization: Trust DESeq2 (Mostly)

DESeq2’s median-of-ratios normalization (Love et al., 2014) handles most datasets well. The method assumes that most genes are not differentially expressed — a reasonable assumption in most experiments, but one that can break down. Edge cases to watch for:

When It Breaks Down

Highly asymmetric changes: If most differentially expressed genes go in one direction (e.g., a global transcriptional shutdown), the median-of-ratios assumption is violated. Consider using spike-in normalization, TMM (Robinson & Oshlack, 2010), or a curated set of housekeeping genes.
Zero-inflated data: Spatial transcriptomics and low-input RNA-seq often produce many zeros. DESeq2 handles moderate zero inflation, but extreme sparsity (>80% zeros) may benefit from specialized tools (e.g., ZINB-WaVE (Risso et al., 2018) + DESeq2).

Variance Stabilizing Transformation (VST)

For downstream analyses like PCA, heatmaps, and clustering, always use vst() or rlog() rather than raw or normalized counts. VST stabilizes variance across expression levels, making high- and low-expression genes more comparable in exploratory analyses. VST is faster and preferred for most datasets (n > 30 samples); rlog is better for very small datasets (n < 10).

A common mistake: using VST-transformed values for the differential expression test itself. Don’t — DESeq2 operates on raw counts internally and applies its own normalization. VST is only for visualization and exploratory analysis.

Sample Exclusion: When and How

Outlier Detection

PCA plots are your first line of defense. A sample that clusters far from its biological group may be:

A sample swap (check sex-linked genes if applicable)
A technical failure (check library QC metrics)
Biologically unusual (a genuine outlier)

DESeq2’s Cook’s distance-based outlier detection is conservative — it flags individual genes, not entire samples. For whole-sample outliers, you’ll need to make a judgment call. But be careful: over-removing outliers can artificially improve results while reducing biological validity.

Sensitivity Analysis

When excluding samples, always run the analysis both with and without the outlier. If your key conclusions hold either way, the exclusion doesn’t matter much. If they change, you have a fragile result that depends on a single data point — which is important to acknowledge.

Contrasts: Getting Exactly What You Want

For experiments with more than two groups, specifying the right contrasts is critical. DESeq2 offers two approaches:

`results(dds, contrast = c("condition", "treated", "control"))`

This extracts a specific pairwise comparison. It’s the most common and most intuitive approach.

`results(dds, name = "conditiontreated")`

This extracts a model coefficient directly. It’s equivalent for simple designs but differs for designs with interaction terms, where the coefficient represents the effect in the reference level of the other variable.

The key is to ensure your contrast matches the biological question — not just the model structure.

All Pairwise Comparisons

For experiments with many groups (e.g., 10 treatment conditions), generating all pairwise comparisons systematically is important for completeness:

combs <- combn(levels(dds$condition), 2)
results_list <- apply(combs, 2, function(pair) {
  results(dds, contrast = c("condition", pair[1], pair[2]))
})

Just remember: when performing many comparisons, controlling the false discovery rate (Benjamini & Hochberg, 1995) across all tests — not just within each one — is critical to avoid inflated significance.

Visualization Best Practices

Visualization is not just presentation — it’s a diagnostic tool. Every plot should help you evaluate whether your analysis is working correctly.

Volcano Plots

The classic log2FC vs. -log10(padj) plot. Our tips:

Use lfcShrink() for effect size estimation — it produces more accurate fold changes, especially for low-count genes
Label the top genes, but not too many (10-20 max, or the plot becomes unreadable)
Use consistent axis limits across comparisons to enable visual comparison
Color-code by significance and fold change threshold, not just p-value

MA Plots

DESeq2’s built-in MA plot (log2FC vs. mean expression) reveals a pattern that volcano plots hide: whether your significant genes are concentrated among high-expression or low-expression genes. A preponderance of significant low-expression genes may indicate technical artifacts.

Heatmaps

For publication-quality heatmaps of differentially expressed genes:

Use VST-transformed values
Scale by row (gene), not by column (sample)
Annotate columns with experimental metadata (treatment, batch, etc.)
Order genes by biological pathway or clustering, not by p-value

The Replication Question

A finding from DESeq2 is only as strong as the biological replication behind it. We strongly advocate for:

Minimum 3 biological replicates per group (absolute minimum; 5+ is preferred)
Understanding what counts as a replicate: Technical replicates (same RNA, sequenced twice) are not biological replicates and should be merged, not treated as independent samples
Power analysis before the experiment: DESeq2 won’t find real effects if you’re underpowered. Tools like RNASeqPower (Hart et al., 2013) can estimate required sample sizes for a given effect size and sequencing depth.

Gene Set Enrichment: The Essential Next Step

Individual gene lists are hard to interpret. Running GSEA (Subramanian et al., 2005) or ORA (Over-Representation Analysis) on DESeq2 results connects your findings to biological pathways and known gene sets.

We recommend fGSEA (Korotkevich et al., 2021), a fast implementation of the GSEA algorithm, for ranked gene list analysis, using the Wald statistic from DESeq2 as the ranking metric. MSigDB’s Hallmark gene sets (Liberzon et al., 2015) provide a curated, non-redundant collection that covers major biological processes.

Key tip: always run GSEA on the full ranked gene list, not just the significant genes. GSEA’s power comes from detecting coordinated changes across a gene set, including genes that individually don’t reach significance.

Common Gotchas

Pre-filtering: Remove genes with very low counts before running DESeq2 (e.g., keep genes with at least 10 reads in at least 3 samples). This speeds up computation and reduces the multiple testing burden without affecting results for genes you care about.
Independent filtering: DESeq2 automatically performs independent filtering to maximize the number of discoveries at a given FDR. This is good — don’t disable it.
NA padj values: Genes with NA in the padj column were either filtered out by independent filtering or had extreme outliers (high Cook’s distance). Don’t treat these as non-significant — they’re explicitly “not tested.”
Log2FC of zero: A gene with log2FC = 0 and padj = 1 wasn’t affected by your treatment. A gene with log2FC = 0.001 and padj = 0.03 probably wasn’t either — statistical significance at tiny effect sizes is rarely biologically meaningful.

Wrapping Up

DESeq2 is powerful, well-documented, and battle-tested. But DESeq2 doesn’t make your analysis rigorous — your decisions do.

At Cytogence, differential expression analysis is a foundational component of nearly every RNA-seq and spatial transcriptomics project we support. If you need help designing your analysis, interpreting results, or troubleshooting unexpected findings, we’re here to help.

References

Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8. PMID: 25516281.
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. PMID: 20196867.
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. A general and flexible method for signal extraction from single-cell RNA-seq data. Nature Communications. 2018;9:284. doi: 10.1038/s41467-017-02554-5. PMID: 29348443.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B. 1995;57(1):289-300. doi: 10.1111/j.2517-6161.1995.tb02031.x.
Hart SN, Therneau TM, Zhang Y, Poland GA, Kocher JP. Calculating sample size estimates for RNA sequencing data. Journal of Computational Biology. 2013;20(12):970-978. doi: 10.1089/cmb.2012.0283. PMCID: PMC3842884.
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545-15550. doi: 10.1073/pnas.0506580102. PMID: 16199517.
Korotkevich G, Sukhov V, Budin N, Shpak B, Artyomov MN, Sergushichev A. Fast gene set enrichment analysis. bioRxiv. 2021. doi: 10.1101/060012.
Liberzon A, Birger C, Thorvaldsdottir H, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Systems. 2015;1(6):417-425. doi: 10.1016/j.cels.2015.12.004. PMID: 26771021.

Cytogence provides bioinformatics consulting for RNA-seq, spatial transcriptomics, and multi-omics research. Reach out to discuss your project.