
When I talk to engineers about the differences between building research software and commercial products, I often get a knowing nod. But when I press them on specifics, the conversation gets murky. “research software is messier,” they’ll say. Or “product software has to scale.” These are true, but they miss the deeper structural differences that shape how we build, maintain, and think about software in these two worlds.
Having spent years working on both sides, I’ve come to appreciate that these aren’t just different points on a spectrum of software quality. They’re fundamentally different problems that demand different approaches.
The Complexity Paradox
Here’s something that we often don’t think about: both research and product software can be equally complex, but in very different ways.

Consider Instagram. At its core, the product does something conceptually simple: it stores photos, applies filters, and displays them in a feed. But Instagram serves billions of users across every timezone, device type, and network condition you can imagine. The complexity isn’t in what it does, it’s in doing it reliably for an impossibly diverse population at a massive scale.
Now think about a seismic data pipeline I worked on that analyzed cross-correlation of ambient noise signals. The algorithm involved Fourier transforms, domain-specific file formats, complex plots, and required deep knowledge of both the math and the data characteristics. But it’s had ~200 users, all of them seismologists running it with similar datasets. We’ve processed data at scale, but in a much more controlled environment. The complexity was more in the what, the actual computation being performed.
This creates a strange inversion. The Instagram engineer worries about edge cases like: What happens when someone uploads a photo from a flip phone in rural Mongolia with 2G connectivity? The research software engineer worries about: Did I implement the signal normalization and whitening correctly?
Both are hard problems, but they’re different kinds of hard. And they require different skill sets. A brilliant algorithmic thinker might struggle with distributed systems and API design. A talented product engineer might be lost in the mathematical intricacies of a numerical solver.
The Prototype Dilemma
One of the most nuanced aspects of research software is what I call the “prototype dilemma.” When should you stop treating your code like an experiment and start treating it like a product?
In traditional software development, this transition point is usually more explicit. You build an MVP, you get user feedback, you see traction (or you don’t), and you make a decision. The feedback loop is fast, and the stakes are understood.
Research software doesn’t work this way. You start with a hypothesis. Maybe you’re building a tool to analyze satellite imagery, or a simulator for protein folding, or a framework for causal inference. Initially, it’s just you (and maybe one collaborator) trying to see if the idea even works. The code is exploratory. You’re learning as you go.
Then something shifts. Maybe a paper gets accepted, and other researchers want to try your method. Maybe a collaborator at another institution asks for the code. Maybe your advisor suggests it could be useful for the three other grad students in the lab. Suddenly, you have users, but you never made a conscious decision to “launch.”
This is where things get messy. The right move would be to step back and refactor, add proper documentation, write tests, and clean up that configuration system that’s currently a jumble of hardcoded values. But there’s always pressure to add the next feature, run the next experiment, or write the next paper. While technical debt happens in product software too, it’s usually better understood by the organization. Research software can limp along indefinitely in this half-finished state.
I’ve seen research projects struggle under the weight of their own success because no one ever made the call to transition from prototype to maintained software. And I’ve seen projects that invested heavily in engineering too early, only to discover the research direction was wrong and all that infrastructure was wasted.
There’s no easy answer here, but I think the key is to be explicit about what phase you’re in and to set clear triggers for transition. If your code is being used by more than 3-4 people outside your immediate team, it’s probably time to invest in making it maintainable. If it’s still just you exploring an idea, keep it scrappy.
The Scientist-as-Product-Manager Problem
Every software project needs someone playing the product manager role, someone who decides what gets built, prioritizes features, and represents user needs. In commercial software, this is a specialized job. PMs develop skills in user research, prioritization frameworks, and stakeholder management.
In research software, this role almost always falls to a scientist. And here’s the thing: they’re often uniquely qualified in some ways and dangerously unqualified in others.
The scientist knows the domain inside out. They understand the research questions, the constraints of the data, and the needs of the users (who are often their colleagues). This deep expertise is invaluable; you can’t build good computational chemistry software without understanding chemistry.
But they typically lack training in product thinking. They might not know how to gather and prioritize requirements. They might optimize for their own use case rather than the broader community. They might say yes to every feature request because they don’t have frameworks for thinking about scope and maintenance burden.
The best research software projects I’ve seen have involved all the appropriate roles: a scientist who is a domain expert, and both PMs and engineers who take the time to learn the domain.
What I Haven’t Covered (Yet)
This post just scratches the surface. Here are some other dimensions worth exploring:
The Open Source Question: Research software is often open source by necessity or culture, while commercial software is frequently proprietary. How does this shape development practices, quality standards, and sustainability models?
The Paper-Driven Development Problem: When your primary incentive is to publish papers rather than maintain software, it creates a misalignment of incentives. Software engineers in industry are rewarded for writing maintainable, well-tested code. Research software engineers (when they exist as a distinct role) are often evaluated on research output, creating a mismatch between what they’re incentivized to do and what the software needs long term.
The Funding Cliff: Most commercial software is funded by revenue or clear business models. Research software is funded by grants that end, creating a sustainability crisis that doesn’t have good solutions.
Maybe I’ll tackle some of these in future posts. For now, I’d love to hear from others who’ve worked in both worlds: What differences have you noticed? What surprised you? What do you wish more people understood about research software?

