Cambridge produces world-class science and, with remarkable consistency, software that nearly sinks it. Not because researchers write bad code — they write code that does exactly what a paper or a grant deliverable needed it to do. The trouble starts when a spinout forms around that science and the Jupyter notebook that generated Figure 3 quietly becomes the company's core product. We build software for biotech companies and research spinouts, and this post is about the gap between those two worlds: what it actually looks like, what most early-stage science companies genuinely need, and how to get it built without spending your runway on the wrong things.
Research code and product software are different artefacts
Research code is optimised for one thing: getting a result the author trusts. It lives in notebooks and scripts, it assumes the author's machine, the author's folder layout, and the author's memory of which cell to run first. It is usually maintained by whoever wrote it — often a postdoc — and when that person moves on, the knowledge moves with them.
None of that is a criticism. It is the correct trade-off for research. But product software has different jobs: it must run reliably for people who didn't write it, on machines the author never touched, with data the author never saw, and it must keep doing so after the team changes. The gap between the two is not "tidying up the code". It is usually:
- Implicit knowledge made explicit. The magic constants, the "always discard the first three readings", the manual step where someone renames files before the script runs. Product software has to encode all of it.
- Failure handling. Research code can crash and be re-run. A system capturing live instrument data cannot lose a run because a serial cable was unplugged for four seconds.
- Identity and audit. Who ran this, on which sample, with which protocol version? A notebook doesn't care. An investor's technical due diligence team, a regulator, or a future you absolutely will.
- Versioned environments. "It works on the lab Mac" is not a deployment strategy.
The honest answer is that most research code shouldn't be ported — it should be treated as a specification. It tells you precisely what the science requires, which is enormously valuable. The product gets rebuilt around that, usually smaller and simpler than people expect, because a lot of research code is scaffolding for experiments the company no longer needs to run.
What spinouts actually need (it's usually one of four things)
Across instrument companies, assay developers and research-tools businesses, the same needs recur. If you're a founder wondering what "getting the software sorted" means, it's almost always some combination of these.
Instrument integration and data capture
Instruments speak serial protocols, vendor SDKs, undocumented USB interfaces, or — depressingly often — they export CSVs to a shared drive and that is the integration. Getting data off instruments automatically, with timestamps, device identity and run metadata attached at the point of capture, is the single highest-leverage piece of software most lab-based companies can buy. It removes transcription errors, it makes every downstream system more trustworthy, and it's the foundation everything else sits on. This work sits right on the hardware–software boundary, which is why our embedded & hardware integration capability gets used so heavily by science clients: half the job is firmware-adjacent, not web development.
Sample and experiment tracking when a LIMS is too heavy
Commercial LIMS platforms are built for established labs with stable processes. An eight-person spinout iterating on its protocol weekly will fight a heavyweight LIMS constantly — and pay enterprise licensing for the privilege. What early teams usually need is far smaller: barcoded samples, a record of what was done to each one and when, links to the resulting data files, and search that works. That's a focused web application, not a six-month LIMS deployment. The trick is building it so it can grow into (or hand over to) a proper LIMS later, rather than becoming the next thing that has to be thrown away.
Data pipelines with provenance
When your dataset becomes your moat — and for many research-tools companies it does — provenance stops being academic hygiene and becomes commercial necessity. Every processed result should trace back to raw files, instrument, operator, protocol version and pipeline version. Done early, this is cheap: it's mostly discipline plus a modest amount of plumbing. Retrofitted after two years of "results_final_v3_USE_THIS.xlsx", it's archaeology. Provenance is also the unglamorous prerequisite for anything involving machine learning: if you want models trained on your data, or LLM-assisted analysis over your experiment history, the lineage has to exist first. We cover that boundary in our AI & LLM integration work, and our consistent advice is the unfashionable bit comes before the fashionable bit.
Dashboards for people who don't open notebooks
Your investors, your commercial lead and your collaborators at a partner organisation should not need a Python environment to know how the science is going. A dashboard showing runs completed, yield trends, QC pass rates — fed automatically from the systems above — changes the texture of board meetings and partner calls. It's often the most visible return on the whole effort, even though it's technically the simplest layer.
The spinout reality: budgets, regulation, IP
Generic software advice tends to ignore the constraints that actually shape decisions at a science company.
Grant-shaped money. Budgets often arrive as Innovate UK milestones or tranche-gated investment, with deliverables and deadlines attached. Software work needs to be scoped to match: discrete phases with demonstrable outputs, not an open-ended retainer. A good partner will structure engagements around your funding reality rather than pretending it doesn't exist.
Regulatory awareness without GxP overkill. If you're heading towards diagnostics or therapeutics, GxP, 21 CFR Part 11 and friends are in your future. They are mostly not in your present. The mistake we see in both directions: teams that ignore regulation entirely and build systems that can never be validated, and teams that gold-plate an MVP to full compliance standards and burn a year doing it. The pragmatic middle is building with a validation-ready posture — audit trails, access control, immutable raw data, documented data flows — while deferring the formal validation effort until the product and the regulatory pathway are actually settled. That posture costs perhaps ten per cent more at MVP stage and saves multiples of that later.
IP sensitivity. Your protocol, your training data and your assay design may be the entire company. That has practical software consequences: be deliberate about what goes into third-party SaaS, what leaves the building (or the country), and what terms your software partner signs. We work under NDA as a matter of course and we're comfortable with arrangements where the sensitive science stays compartmentalised — we need to understand your data's shape, not necessarily its secrets.
Why hardware-and-software capability matters
If your product touches an instrument — yours or someone else's — a web-only development shop will struggle at exactly the point where your problems are hardest. Instrument-adjacent work means reading vendor protocol documents of varying honesty, debugging timing issues with a logic analyser rather than a browser console, handling devices that disconnect mid-run, and designing software that degrades gracefully when hardware misbehaves — because hardware always, eventually, misbehaves.
This is where the way we work is a genuine advantage rather than a slogan. We're an AuDHD-founded company and we build differently: hyperfocus means when we take on your instrument's undocumented serial behaviour, we are fully inside that problem until it yields. Pattern recognition built across embedded systems, cloud platforms and data engineering means we spot the architecture where the firmware, the capture service and the analysis pipeline fit together cleanly — instead of three teams building three things that meet badly in the middle. And systems thinking end to end means nobody has to be the integrator of last resort: that's the job we take on.
Engaging a software partner without burning runway
A few honest recommendations, including ones that cost us work:
- Start with a short, paid discovery. One to two weeks. Output: a map of your data flows, the riskiest technical unknowns, and a phased plan with costs. If a partner won't show you their thinking before asking for a six-month commitment, walk away.
- Build the smallest system that removes your worst manual step. Not the platform. Not the vision. The one thing that's costing you hours or corrupting data every week. Prove the partnership on that.
- Insist on owning everything. Your repositories, your cloud accounts, your documentation. Any arrangement where the supplier holds the keys is a future hostage negotiation.
- Don't hire a full-time software team too early. A founding-stage science company rarely has enough sustained software work to keep good engineers engaged, and the hiring itself eats months. A partner who can flex up for a build phase and down to maintenance is usually better economics until the product demands otherwise — at which point a good partner helps you hire and hands over cleanly.
- Ask hypothetical-you questions. Imagine a lab whose lead developer — the postdoc who wrote everything — leaves in three months. Could anyone else run the system? If a proposal doesn't make the answer "yes", it isn't a product proposal.
Being genuinely local matters more in this niche than most. Instrument work means being physically in the lab; data-sensitivity conversations go better across a table. If you want the broader picture of how we work with companies here, see our page on software development in Cambridge — research software development in Cambridge is, unapologetically, the work we set this company up to do.
Talk to us
If you're a spinout staring at a folder of scripts and wondering how it becomes a product — or a research-tools company whose lab data management is held together by naming conventions — we're happy to have a no-obligation conversation about it. Email hello@overclockminds.co.uk or use our contact page. Worst case, you leave with a clearer map of the problem.