What to Include in a Bioinformatics Statement of Work
A practical guide for researchers contracting bioinformatics analysis for the first time. What to define upfront, what to expect in deliverables, and how to avoid scope creep.
A bioinformatics project can go off track long before the analysis starts. In many cases, the problem is not the data — it’s the scope. Projects with poorly defined scope don’t just run late — they often produce results that can’t be used for publication.
You’ve generated your data, and you need a bioinformatician to analyze it. Maybe you’re hiring a consultant, engaging a core facility’s analysis service, or contracting with a company like Cytogence. Either way, you need a statement of work (SOW) — a document that defines what will be done, what will be delivered, and what it will cost.
If you’ve never contracted bioinformatics work before, this guide will help you get it right the first time.
Why a Statement of Work Matters
Bioinformatics projects fail for predictable reasons:
- Undefined scope: “Analyze my RNA-seq data” can mean a 2-hour job or a 2-month job depending on what “analyze” means to each party.
- Missing metadata: The analyst discovers halfway through that sample group assignments are unclear, batch information is missing, or the experimental design is more complex than described.
- Scope creep: The PI asks for “one more comparison” after the analysis is complete — repeatedly. Each additional comparison is a new analysis, not a minor tweak.
- Unclear deliverables: The analyst delivers R scripts; the PI expected publication-ready figures and a methods section. Neither is wrong, but the expectation mismatch causes friction.
A good SOW prevents all of these. It’s not bureaucracy — it’s how serious projects stay on time, on budget, and scientifically aligned.
What a Good SOW Protects Both Sides From
A well-written SOW protects the client from surprise costs and unclear deliverables. It protects the analyst from open-ended scope and missing information. And it gives both parties a shared definition of “done” — which is often the most important thing to agree on before work begins.
Essential Elements
1. Biological Question
State the question you want answered — in biological terms, not statistical terms.
Vague: “Run differential expression on my samples.”
Clear: “Identify genes differentially expressed between drug-treated and vehicle-control tumors, with batch correction for processing date. We expect the treatment to affect inflammatory pathways, so pathway enrichment analysis for immune-related gene sets is also needed.”
The biological question drives every downstream decision — the statistical model, the comparisons, the visualizations, and the interpretation.
2. Data Description
Be specific about what data you’re providing:
- Platform: GeoMx DSP, bulk RNA-seq (Illumina NovaSeq), 10x Chromium scRNA-seq, etc.
- Format: FASTQ files, count matrices, DCC/PKC files, CSV exports
- Sample count: How many samples, how many groups, how many replicates per group
- Species: Human, mouse, rat, other
- Tissue type: FFPE, fresh-frozen, cell culture, organoids
- Known issues: Any samples that failed QC, were collected under unusual conditions, or have incomplete metadata
3. Experimental Design
This is the most important section and the one most often inadequate:
- Groups and comparisons: What are the experimental groups? What comparisons should be made? (Treatment vs. control? Across time points? Between tissue compartments?) What is the reference level / baseline group?
- Covariates: Are there batch effects, patient-level pairing, or other confounders that need to be accounted for?
- Replicates: How many biological replicates per group? Are there technical replicates that should be merged?
- Interaction effects: Are you looking for main effects only, or do you need interaction terms (e.g., “does the treatment effect differ by genotype”)?
- Exploratory vs. predefined: Are only the specified comparisons included, or is exploratory analysis (hypothesis-generating) also in scope?
4. Analysis Scope
Define specifically what analyses are included:
- Quality control and preprocessing
- Normalization method(s)
- Differential expression (specify comparisons)
- Pathway enrichment (specify databases: GO, KEGG, MSigDB, etc.)
- Clustering or deconvolution
- Visualization (volcano plots, heatmaps, PCA)
- Multi-omics integration (if applicable)
- Statistical testing beyond DE (survival analysis, correlation analysis, etc.)
Equally important: define what is not included. “Additional comparisons beyond those specified above will be scoped as separate work” is a sentence that prevents scope creep.
5. Deliverables
Be explicit about what you’ll receive:
Be explicit about what you’ll receive — and at what level of polish. There’s a significant difference between:
- Raw results: Output files from the analysis pipeline (for analysts who want to explore further)
- Interpreted results: Figures with context, statistical summaries with biological interpretation
- Manuscript-ready results: Publication-quality figures, formatted tables, draft methods section, figure legends
Define which level you need:
- Figures: File formats (PNG, PDF, SVG), resolution (300 DPI for publication), whether interactive HTML figures are included
- Tables: Differential expression results (all genes or filtered?), pathway enrichment results, summary statistics
- Report: Narrative analysis report with methods, results, and figure legends? Or just raw output files?
- Code: Will the analysis code be provided for reproducibility? In what format (R Markdown, Jupyter notebook, scripts)?
- Methods section: A draft methods paragraph suitable for inclusion in a manuscript?
- Revisions: How many rounds of revision are included? What constitutes a “revision” vs. new analysis?
6. Timeline
Set realistic expectations:
- Data handoff: When will you provide the data and metadata?
- Analysis duration: Standard projects often take on the order of a few weeks, but timelines vary substantially with data type, sample count, metadata quality, and the number of requested deliverables
- Review cycles: How long will the PI take to review results and provide feedback? Build this into the timeline. Delays in metadata handoff or review feedback will affect the overall schedule.
- Dependencies: Does the analysis depend on anything else (pathologist review, additional samples, collaborator input)?
7. Assumptions and Dependencies
This is one of the most overlooked sections — and one of the most important. Spell out what the project assumes to be true:
- Metadata completeness: Analysis assumes metadata are complete and accurate at handoff. Rework caused by incomplete or incorrect metadata may require re-scoping.
- Data quality: Analysis assumes samples passed minimum QC thresholds. QC failures discovered during analysis may limit scope or require additional work.
- Client response time: Timelines assume feedback on intermediate results is returned within an agreed number of business days.
- Access to raw files: The analyst will need access to raw data files, not just processed summaries.
- Meetings: Define how many project meetings are included and whether ad-hoc consultations are billable.
- External dependencies: If the project requires pathologist review, IRB approval, collaborator signoff, or additional data generation, note these as dependencies that affect timeline.
Making assumptions explicit protects both sides from mid-project surprises.
8. Budget and Payment Terms
For external consultants or companies:
In bioinformatics, cost is driven less by data size and more by scope complexity — the number of comparisons, the depth of interpretation, and the level of deliverables required.
- Fixed price vs. hourly: Fixed price gives budget certainty but requires well-defined scope. Hourly is flexible but unpredictable.
- Payment milestones: Common structures include 50% upfront / 50% on delivery, or payment upon delivery of final report.
- Change orders: How are additional analyses beyond the original scope priced and approved? A good SOW should define how changes are handled after work begins. Adding new comparisons, new sample groups, additional figures, or new omics layers should trigger a written scope update — not an informal email thread.
- What’s included in the fee: Are project meetings included? Is rework from changed metadata billable? Is manuscript revision support included or separate?
A Note on SOW Formality
Not every project needs a 10-page contract. In practice, SOWs range from:
- Lightweight pilot scope: A short email or one-pager defining a focused analysis (e.g., “DE analysis on 6 samples, two groups, volcano plot + top gene table”)
- Full analysis SOW: The comprehensive document described above, for multi-comparison, multi-omics, or publication-targeted projects
- Amendment / change order: A short addendum for additional analyses beyond the original scope
- Retainer / overflow agreement: For ongoing partnerships with recurring analysis needs
Match the formality of the SOW to the complexity and cost of the project.
Common Pitfalls
Mismatched scope and question
We’ve seen projects where analysis was completed, only for the PI to realize the comparisons didn’t match the experimental question — requiring the entire analysis to be redone. This is preventable with a clear SOW.
”Can you also…”
The three most expensive words in bioinformatics consulting. Each “can you also” is a new analysis with its own statistical considerations, visualization needs, and interpretation. Define the scope upfront and treat additions as change orders.
Incomplete metadata
If you can’t tell your analyst which samples belong to which group, the project stops. Prepare your metadata file before engaging the analyst. A CSV with columns for sample_id, group, batch, patient_id, and any relevant clinical variables is the minimum.
Expecting instant turnaround
Bioinformatics is not pushing a button. A competent analyst needs time to understand your data, verify quality, troubleshoot issues, run the analysis, interpret results, and prepare deliverables. Rushing this process produces errors.
Not reviewing intermediate results
Good analysts share preliminary results (QC reports, PCA plots, initial DE results) before completing the full analysis. Review these carefully — if something looks wrong at the QC stage, it’s much cheaper to fix than after the full analysis is complete.
A Template Checklist
Use this to build your SOW:
- Biological question stated clearly
- Data platform and format specified
- Sample count and group assignments documented
- Species and tissue type noted
- Experimental design (groups, comparisons, covariates, reference level) defined
- Exploratory vs. predefined analysis scope clarified
- Specific analyses listed (DE, pathways, clustering, etc.)
- Exclusions stated (“not included in this scope”)
- Deliverable level defined (raw, interpreted, or manuscript-ready)
- Deliverable formats specified (figures, tables, report, code)
- Number of revision rounds agreed
- Assumptions and dependencies documented
- Timeline with milestones
- Client feedback turnaround expectation set
- Change order process defined
- Budget and payment terms (if applicable)
- Metadata file prepared and reviewed for completeness
- Point of contact for biological questions identified
How We Scope Projects at Cytogence
At Cytogence, every engagement starts with a free project consultation. We review your experimental design, data format, and analysis goals, then provide a written scope of work with defined deliverables, timeline, and cost — including explicit assumptions, change-order process, and deliverable levels. No surprises, no open-ended billing.
We’ve seen how projects go wrong — and we’ve built our scoping process specifically to eliminate the most common failure points: unclear scope, missing metadata, undefined deliverables, and uncontrolled scope expansion. If you’ve never contracted bioinformatics work before, we’ll walk you through the process and help you define the right scope for your study.
Cytogence provides bioinformatics consulting with clear deliverables and transparent pricing. Contact us to start a conversation about your project.