Computational Biology: Thinking in Five Dimensions

Nobody starts out as a computational biologist (In most cases :)). You start as something else, a biologist, a programmer, a mathematician, a chemist, and then the problems you care about drag you across disciplinary boundaries until you realize you are doing something that does not have a clean name. You are not quite a biologist, because you spend most of your time writing code. You are not quite a software engineer, because the code you write is governed by biochemical constraints, not user stories. You are not quite a mathematician, because your equations are meant to describe cells, not prove theorems.

You are, instead, a person who has learned to think in multiple registers simultaneously, switching between the intuitions of different disciplines depending on what the problem demands. This is what makes computational biology both uniquely powerful and uniquely difficult. The power comes from the synthesis. The difficulty comes from the fact that each discipline has its own way of seeing the world, its own standards of rigor, its own definition of what counts as a good answer.

This essay is an attempt to articulate what those different modes of thinking are, how they interact, and why the best computational biologists are the ones who can hold all of them in their head at the same time.

The Biologist's Intuition: Thinking in Mechanisms

The biologist's fundamental mode of thinking is mechanistic. When a biologist looks at a dataset, they are not looking for statistical patterns in the abstract. They are looking for a story: a causal chain that connects molecules to functions to phenotypes. The biologist asks: what is happening inside the cell? Which genes are being expressed, and why? What does this protein do, and what happens when it is absent? How does this pathway connect to the phenotype we observe?

This mode of thinking is rooted in evolution, which is the biologist's master framework. Every molecular mechanism exists because it was selected for. Every regulatory circuit, every metabolic pathway, every protein fold is a solution to a problem that evolution posed. The biologist's intuition is shaped by this: when something in biology looks arbitrary, it usually is not. There is a reason. Finding it is the work.

The danger of this mode of thinking is narrative bias. Biological mechanisms are compelling stories, and humans are wired to find stories satisfying even when the evidence is thin. A biologist can construct a plausible mechanistic explanation for almost any result, which is exactly why mechanistic thinking must be disciplined by statistics, by quantification, and by falsifiable prediction. This is where the mathematician and the scientist come in.

The biologist's gift is the question: what is this for? The biologist's danger is assuming there is always a clean answer.

The Software Engineer's Rigor: Thinking in Systems

The software engineer thinks in systems, not mechanisms. Where the biologist asks "what is happening?", the engineer asks "how should I structure this so it works reliably, scales efficiently, and can be maintained by someone who is not me?"

In computational biology, the engineering mindset shows up in pipeline design, code architecture, testing strategy, deployment, and reproducibility. The engineer knows that code that works on your laptop does not necessarily work on someone else's. The engineer knows that a function without a test is a liability. The engineer knows that naming things well, structuring data clearly, and documenting decisions are not overhead. They are the work.

The engineer's contribution to computational biology is often invisible but essential. When a bioinformatics pipeline processes 500 sequencing samples overnight without failing, that is engineering. When a scientific web application handles 50 concurrent users without crashing, that is engineering. When a collaborator can reproduce your analysis from your GitHub repository six months after you published it, that is engineering.

The danger of pure engineering thinking in biology is over-abstraction. A beautifully architected pipeline that solves the wrong biological problem is worse than a messy script that solves the right one. The engineer must always be in dialogue with the biologist about what actually needs to be computed and why. Architecture without domain knowledge produces technically excellent systems that nobody uses.

Where Engineering Meets Biology: The Design of Data Structures

One of the most underappreciated skills in computational biology is the design of data structures that faithfully represent biological relationships. How you structure your data determines what questions you can ask efficiently. A gene expression matrix where rows are genes and columns are samples encodes a specific view of the data. A graph database where nodes are genes and edges are regulatory interactions encodes a different view. An anndata object that stores expression, cell metadata, gene metadata, and dimensionality reductions together encodes yet another.

Each representation makes some analyses trivial and others impossible. The engineer's job is to choose representations that align with the biological questions, not the other way around.

The Mathematician's Abstraction: Thinking in Structures

The mathematician sees past the biological details to the underlying structure. Where the biologist sees a metabolic network and the engineer sees a data pipeline, the mathematician sees a directed graph with flux constraints and asks: what are the properties of this graph? What can I prove about it? What are its symmetries, its invariants, its limits?

Mathematical thinking in computational biology is what lets you recognize that two apparently different problems are actually the same problem. Sequence alignment and speech recognition share the same dynamic programming structure. Phylogenetic tree construction and hierarchical clustering share the same mathematical framework. Flux balance analysis and linear programming are the same optimization problem. Hidden Markov models for gene prediction and hidden Markov models for speech recognition are the same statistical framework applied to different data.

This ability to see structural similarity across domains is one of the most valuable forms of expertise a computational biologist can develop. It means that when you encounter a new problem, you do not start from scratch. You recognize its mathematical structure and import the entire toolkit that has been developed for that structure in other fields.

The danger of pure mathematical thinking in biology is mistaking the model for the reality. A genome-scale metabolic model is a linear programming problem, but the cell is not a linear programming problem. The model captures some aspects of cellular metabolism and completely ignores others (kinetics, regulation, stochasticity, spatial organization). The mathematician who forgets this produces elegant models that are biologically meaningless. The mathematician who remembers it produces models that illuminate specific aspects of biology with precision, while being honest about what they leave out.

The Philosopher's Skepticism: Thinking About Thinking

This is the mode of thought that most computational biologists develop last, if they develop it at all. And it may be the most important.

The philosopher asks questions about the questions. Not "what is the answer?" but "what would count as an answer?" Not "is this model correct?" but "what do we mean by correct, and how would we know?" Not "what does this gene do?" but "what does it mean to say a gene does something, given that gene function is context-dependent, polygenic, and probabilistic?"

Philosophical thinking in computational biology shows up in places most people do not notice. When you choose a metric to evaluate your model, you are making a philosophical decision about what matters. When you define a cell type from single-cell data, you are making a philosophical decision about what constitutes a natural category. When you report a p-value, you are operating within a specific philosophical framework (frequentist hypothesis testing) that has assumptions and limitations that most biologists never examine.

The Problem of Induction in Biological AI

Here is where philosophical thinking becomes urgent for computational biology in 2026. Foundation models like ESM-2 and scGPT are trained on existing data and make predictions about new data. The implicit assumption is that the patterns in the training data will generalize. But how do we know they will? This is the problem of induction, first articulated by David Hume in 1739 and still not solved.

In practice, this manifests as distributional shift: a model trained on protein sequences from well-studied organisms may fail on sequences from uncharacterized organisms. A model trained on single-cell data from healthy tissue may fail on data from disease. The philosopher's contribution is to make these assumptions explicit, so that the biologist can evaluate whether they hold and the engineer can build systems that detect when they do not.

The Scientist's Discipline: Thinking in Evidence

The scientist binds all of the other modes together with a single commitment: the commitment to evidence. The biologist's mechanistic story must be tested against data. The engineer's system must produce reproducible results. The mathematician's model must make falsifiable predictions. The philosopher's framework must be evaluated by whether it leads to better science.

The scientific mode of thinking in computational biology means: always asking "how would I know if I were wrong?" Designing controls. Validating models on held-out data. Checking edge cases. Being suspicious of results that are too clean. Publishing your code and data so that others can verify your work.

It also means being comfortable with uncertainty. In biology, most questions do not have definitive answers. They have provisional answers supported by evidence of varying strength. The scientist's job is to quantify that strength honestly, communicate it clearly, and update their beliefs when new evidence arrives.

The computational biologist who only writes code is an engineer. The one who only thinks about mechanism is a biologist. The one who only proves theorems is a mathematician. The one who integrates all of them, bound together by the discipline of evidence, is something more than any of them alone.

How the Modes Interact in Practice

Let me give a concrete example from my own work. When I built a genome-scale metabolic model of Halomonas elongata, a halophilic bacterium that produces bioplastics, all five modes of thinking were in play simultaneously.

The biologist in me asked: what is this organism doing metabolically in high-salt environments? How does it allocate carbon between growth and bioplastic production? What regulatory signals control the switch?

The mathematician in me represented the metabolic network as a stoichiometric matrix and formulated the flux balance analysis as a linear programming problem, understanding that this captured the steady-state mass balance constraints but not the kinetics.

The engineer in me structured the code so that the model could be parameterized with different growth conditions, run in batch, and have its outputs compared against experimental measurements automatically.

The philosopher in me asked: what does it mean for this model to be "correct"? It predicts growth rates and flux distributions, but it cannot capture the stochastic fluctuations that might determine whether a single cell enters bioplastic production mode. Is the steady-state assumption valid? When does it break?

The scientist in me validated the model against experimental growth data, identified where predictions diverged from measurements, used those divergences to generate new hypotheses about missing regulatory interactions, and designed experiments to test them.

No single mode of thinking could have produced that workflow. It required all five, in conversation with each other, constantly checking and correcting each other's blind spots.

Developing These Modes of Thinking

Practical advice for building an interdisciplinary mind

Read outside your training. If you were trained as a biologist, read software engineering books (The Pragmatic Programmer, Clean Code). If you were trained as a programmer, take a serious biology course. If you have never studied philosophy of science, read Thomas Kuhn or Karl Popper. The goal is not to become an expert in everything. It is to develop enough fluency in each mode of thinking that you can recognize when a problem requires it.
Work with people who think differently than you. The fastest way to develop a new mode of thinking is to collaborate with someone who already has it. The engineer will teach you about testing. The biologist will teach you about mechanism. The mathematician will teach you about structure. You will teach them about whatever you bring. This is not just collaboration. It is apprenticeship in a different way of seeing.
Build things. There is no substitute for the act of building a tool, a model, a pipeline that someone else will use. Building forces you to confront every mode of thinking simultaneously: the biology has to be right, the code has to work, the math has to hold, the assumptions have to be explicit, and the results have to be validated. Building is where the integration happens.
Write about your work. Writing forces clarity. If you cannot explain your model's assumptions in plain language, you do not understand them well enough. If you cannot describe why your engineering choices matter for the science, the connection is not clear in your own mind. Writing is thinking made visible.
Sit with discomfort. The hardest part of interdisciplinary work is the persistent feeling that you are not expert enough in any one area. This feeling never entirely goes away. It is the tax you pay for breadth. The reward is that you can see connections that specialists miss, ask questions that specialists do not think to ask, and build things that no single discipline could produce alone.

Computational biology is hard because it demands a kind of thinking that no single training program teaches. You have to learn it by doing it, by failing at it, by noticing when you are thinking like a biologist when you should be thinking like a mathematician, or like an engineer when you should be thinking like a philosopher.

But it is also, I think, one of the most intellectually rewarding fields that exists right now. The problems are real and urgent. The tools are powerful and getting more powerful. The biology is inexhaustible. And the need for people who can hold multiple modes of thinking together, who can bridge between the code and the cell, between the equation and the experiment, between the model and the meaning, has never been greater.

That is the work I have chosen. I am still learning how to do it well. I suspect I always will be.

Blaise Manga Enuh, PhD

Computational biologist and bioinformatics engineer at the Great Lakes Bioenergy Research Center. I build ML models, bioinformatics pipelines, and AI-augmented scientific software tools at the intersection of microbial biology and machine learning.

Back to site Get in touch

All writing