Before the Model, There Was the Noise

There is a story that gets told about the history of biology in which knowledge marches forward in a roughly linear direction: we discover the cell, then the gene, then the double helix, then the genome, and now the AI. It is a tidy story and it is mostly wrong. The real history is messier, full of ideas that arrived before the tools to test them, tools that arrived before the conceptual frameworks to interpret them, and entire decades in which the most important variable in a biological system was being systematically treated as a nuisance to be averaged away.

That variable is noise. Randomness. Stochasticity.

The history of stochastic approaches in biology is not the history of a single discovery. It is the history of a community slowly and unevenly learning to take randomness seriously as a first-class property of living systems rather than a symptom of insufficient experimental control. It stretches from a monk counting pea plants in a monastery garden in the 1860s to Gillespie's algorithm running on a laptop in 2026 to simulate gene expression in a single bacterium. The arc is long, and understanding it matters if you want to understand where the field is going next.

The Probabilistic Roots: Mendel, Galton, and the Mathematics of Inheritance

The first truly stochastic model in biology was not recognized as such when it was published. Gregor Mendel's 1866 paper, "Versuche uber Pflanzenhybriden" (Experiments on Plant Hybridization), presented the results of eight years of controlled crosses in garden peas. The ratios he reported, 3:1 for dominant to recessive traits in the second generation, were not explained by Mendel as deterministic outcomes. They were explained as the result of random assortment of discrete hereditary units during gamete formation.

This is, in retrospect, a fully probabilistic model. Each gamete independently samples one allele from a two-allele pool. The 3:1 ratio is an expectation over many such samples, not a guaranteed outcome of any specific cross. Mendel understood this: he designed his experiments with large sample sizes precisely because he knew the ratios would only emerge reliably across many trials. The randomness was baked into the model from the beginning.

The broader community did not read it that way, and the paper sat largely unread for thirty-four years. When it was rediscovered simultaneously by de Vries, Correns, and Tschermak in 1900, the reaction was still not primarily probabilistic. The focus was on the discrete units of inheritance, what would become genes, not on the stochastic mechanism of their transmission.

Francis Galton approached the problem from the other direction. Working on human inheritance in the 1880s and 1890s, he was confronted with continuous traits that did not follow Mendel's discrete ratios. His response was to reach for statistics. He developed regression to the mean, introduced the concept of correlation, and with Karl Pearson built the biometric school of inheritance, a framework built entirely on statistical descriptions of population-level variation. The irony is that Galton's approach was explicitly probabilistic without being mechanistic. He could describe the distribution of trait values across a population with great precision. He had no model of what was generating that distribution at the molecular level.

Historical Note

The Mendelian and biometric schools were in explicit conflict for the first two decades of the twentieth century, a dispute that was partly scientific and partly sociological. Both sides were right about something. Mendel had the mechanism and the probability model. Galton had the statistical tools for handling continuous variation. The Modern Synthesis of the 1930s, primarily the work of Fisher, Wright, and Haldane, reconciled them by showing that continuous variation could arise from many Mendelian loci acting together, each following the same probabilistic rules of assortment.

Physics Delivers the Framework: Brownian Motion and the Statistical Mechanics of Life

The conceptual tools that would eventually make stochastic modeling of biological systems possible did not originate in biology. They came from physics, and the pathway from physics to biology was neither direct nor quick.

In 1905, in the same year he published special relativity, Albert Einstein published a paper on the motion of small particles suspended in a liquid, the phenomenon first described by botanist Robert Brown in 1827. Brown had observed that pollen grains in water moved continuously and randomly under the microscope. For nearly eighty years this was attributed to convection currents, vibrations, or biological activity. Brown himself showed it was not alive by using ground glass particles, but he could not explain the cause.

Einstein's explanation was that the motion was caused by the thermal bombardment of water molecules. The particle is hit from all directions by molecules moving with a distribution of velocities determined by temperature, and the net force at any moment is random. His paper provided the mathematical description of this diffusion process and, crucially, predicted that the mean squared displacement of a particle should increase linearly with time, a prediction that Jean Perrin confirmed experimentally in 1908. The result was the first rigorous mathematical treatment of a random process with physical consequences.

The relevance to biology is not immediately obvious, but it is profound. Diffusion governs how molecules move inside cells. Transcription factors finding their target DNA binding sites, signaling molecules traversing the cytoplasm, ions moving across membranes: all of these processes are fundamentally Brownian at the molecular scale. The stochastic behavior of gene expression is, at its deepest level, a consequence of the same thermal noise that makes pollen grains jitter under a microscope.

1920s to 1940s

Population Genetics and the Stochastic Theory of Evolution

While physicists were formalizing the mathematics of random processes, biologists were discovering that randomness was a central driver of evolution, not just a source of measurement error. The key figure here is Sewall Wright, whose concept of genetic drift, introduced in a landmark 1931 paper, established that allele frequencies in finite populations change from generation to generation due to random sampling, independently of natural selection.

Wright's argument was simple and initially controversial: in a population of finite size N, only 2N alleles contribute to the next generation, and the sampling of those alleles from the parental pool is a random process. In a large population, this sampling noise is small relative to selection. In a small population, it can overwhelm selection entirely. An allele can be lost by chance even if it is beneficial. An allele can fix by chance even if it is neutral or slightly deleterious.

This was a direct challenge to the adaptationist program, the view that every evolved trait is an adaptive response to selection. Wright's model said evolution was partly stochastic and that the fraction of evolution attributable to drift versus selection depended on effective population size, a parameter that could in principle be estimated but was not directly observable. It introduced irreducible uncertainty into the theory of evolution at a conceptual level, not just a practical one.

Simultaneously, Ronald Fisher was developing the mathematical theory of natural selection as a deterministic process in large populations. The tension between Wright's stochastic view and Fisher's deterministic view generated a productive controversy that ran for decades and shaped the theoretical foundations of population genetics.

Randomness entered evolutionary biology not as a confession of ignorance but as a mechanistic claim: that finite populations sample their own gene pools imperfectly, and that this imperfection has evolutionary consequences that selection alone cannot explain.

The Molecular Revolution Encounters Noise and Mostly Ignores It

The discovery of the double helix in 1953 inaugurated a period of extraordinary progress in molecular biology, and it did something interesting to stochastic thinking in the life sciences: it largely suspended it. The central dogma, DNA to RNA to protein, was a deterministic framework. A gene had a sequence. That sequence determined a protein. The protein had a function. The causality ran in one direction and it ran reliably.

This was not wrong. It was incomplete. But the incompleteness was not immediately apparent because the tools of molecular biology in the 1950s, 1960s, and 1970s were bulk tools. They measured the average behavior of millions or billions of cells simultaneously. Gel electrophoresis, spectrophotometry, radioactive labeling: all of these techniques report population means. The variance across individual cells was invisible by construction, averaged away before anyone could see it.

The noise was always there. The tools just could not see it.

The Master Equation and Its Biological Implications

While molecular biologists were working with bulk measurements, a parallel development was occurring in theoretical physics and chemistry that would eventually reshape biological modeling. The chemical master equation, a differential equation describing how the probability distribution over the number of molecules of each chemical species evolves over time, was formalized during this period. It is the stochastic counterpart of deterministic rate equations: instead of tracking concentrations, it tracks probability distributions over molecular counts.

The master equation had been known in principle for decades, but its application to biological systems required a specific insight: that when molecular copy numbers are low, the distinction between concentrations and counts matters enormously. A concentration of 1 nanomolar of a transcription factor in a bacterial cell corresponds to roughly 0.6 molecules on average. You cannot have 0.6 molecules. You have either zero or one, and the difference between those two states is not a minor perturbation. It is the difference between a gene being regulated or not.

This insight would not become widely recognized in biology for another two decades, but the theoretical machinery to handle it was being developed in parallel, largely by physicists and chemists who were not primarily thinking about gene regulation.

The Gillespie Algorithm: Making Stochastic Simulation Computable

The practical breakthrough that made stochastic modeling of biochemical systems tractable came in 1977 with a paper by Daniel Gillespie in the Journal of Physical Chemistry. Its title was "Exact Stochastic Simulation of Coupled Chemical Reactions," and it introduced what is now universally called the Gillespie algorithm or stochastic simulation algorithm (SSA).

The problem Gillespie solved was computational. The master equation describes the time evolution of a probability distribution over all possible states of a biochemical system, but for any realistic system with more than a handful of molecular species, the state space is so large that directly solving the master equation is computationally infeasible. What Gillespie showed was that you could instead simulate individual realizations of the stochastic process exactly, by computing, at each moment, the probability that any given reaction fires next and the time until it fires, both of which can be sampled from analytically known distributions.

The algorithm works by treating each reaction as a competing random process. At each step, you calculate the propensity of every possible reaction, which is proportional to the reactant concentrations and the rate constant. You then sample a random time to the next reaction from an exponential distribution with rate equal to the sum of all propensities, and you sample which reaction occurs from a categorical distribution weighted by individual propensities. You update the molecular counts and repeat.

Why This Mattered

Before Gillespie, stochastic simulation of biochemical systems required either analytically solving the master equation (infeasible for large systems) or using approximations that sacrificed exactness. Gillespie's insight was that individual trajectories of the stochastic process could be simulated exactly using only elementary probability theory. The algorithm produces results that are statistically identical to what you would get from solving the master equation directly, at a computational cost that scales with the number of reaction events rather than the size of the state space. For systems with low molecule numbers, this was a transformative capability.

The immediate uptake in biology was limited. In 1977, the idea that gene expression was a stochastic process driven by small numbers of molecules was not yet established experimentally. The Gillespie algorithm was a solution to a problem that most biologists did not yet know they had.

The Neutral Theory and the Stochasticity of Evolution at the Molecular Level

One year after Gillespie's paper, and largely independently of it, evolutionary biology was being reshaped by its own confrontation with stochasticity. Motoo Kimura's neutral theory of molecular evolution, first proposed in 1968 and developed through the 1970s and 1980s, made the radical claim that the majority of molecular variation observed in DNA sequences was selectively neutral and that its evolutionary fate was governed not by selection but by random genetic drift.

Kimura's argument rested on rates. The observed rate of amino acid substitution in proteins across species was far higher than could be explained if each substitution required positive selection to fix. Neutral mutations, fixed by drift, could explain the observed rates. His mathematical framework, built on the diffusion approximation of Wright's drift model, predicted that the rate of neutral evolution should be approximately equal to the mutation rate and should be roughly constant across lineages, a prediction that the molecular clock literature was beginning to accumulate evidence for.

The neutral theory was not universally accepted and remains a subject of debate in its strong form. But its lasting contribution to stochastic thinking in biology was to establish drift as the null model for molecular evolution. Before Kimura, the default assumption was that any molecular difference between species reflected selection. After Kimura, the default question became: is there evidence that selection, rather than drift, drove this substitution? Randomness became the baseline against which adaptation had to be demonstrated, not the residual after adaptation had been explained.

Single Molecules, Single Cells: Experiments Catch Up to Theory

The critical experimental development that transformed stochastic biology from a theoretical enterprise into an empirically grounded one was the ability to observe individual molecules and individual cells.

1990s — Single-molecule detection

Physics enables direct observation of molecular events

Single-molecule fluorescence microscopy and optical tweezers made it possible to watch individual molecular machines at work: a single ribosome translating a single mRNA, a single RNA polymerase transcribing a gene, a single motor protein walking along a cytoskeletal filament. The stochastic pauses, backslides, and erratic stepping that these experiments revealed were invisible in bulk assays and impossible to predict from deterministic kinetic models.

1998 — Transcription in real time

Ko et al.: gene expression occurs in stochastic bursts

Studies showing that transcription from individual gene loci was not a continuous, rate-limited process but occurred in discrete bursts separated by periods of silence. The mean expression level was a poor description of the temporal dynamics, which were highly variable and could only be understood by modeling the switching between active and inactive promoter states as a stochastic two-state process.

2002 — Elowitz et al. in Science

Intrinsic and extrinsic noise decomposed in living cells

Using dual fluorescent reporters in isogenic E. coli, Michael Elowitz and colleagues provided the first clean experimental decomposition of gene expression noise into intrinsic components (from the randomness of molecular reactions within a single cell) and extrinsic components (from cell-to-cell differences in the concentrations of shared regulators). This paper is the inflection point in experimental stochastic biology: it converted the theoretical prediction that gene expression is noisy into a quantified, mechanistically decomposed measurement.

2006 — Raj et al. and single-molecule FISH

Counting individual mRNA molecules in fixed cells

Single-molecule fluorescence in situ hybridization allowed researchers to count the exact number of mRNA molecules from a specific gene in individual fixed cells. The distributions they observed were far broader than Poisson, the distribution expected if transcription were a simple random process with a constant rate. The excess variance was direct evidence of transcriptional bursting: genes switch between active and inactive states, and mRNA is produced in bursts from the active state.

2008 to present — Single-cell sequencing

Stochasticity at genome scale

Single-cell RNA sequencing made it possible to measure the transcriptome of individual cells at genome-wide scale. The cell-to-cell variability revealed by these experiments was not experimental noise. It was biological signal: different cells in the same tissue expressing different transcriptional programs, transitional cell states, rare subpopulations with distinct identities, stochastic commitment to differentiation pathways. The computational challenge of analyzing single-cell data is, in large part, the challenge of distinguishing biological stochasticity from technical noise, a distinction that requires both statistical rigor and biological judgment.

Theoretical Biology Responds: From ODE to SSA

As experimental evidence for biological stochasticity accumulated, theoretical and computational biology had to respond with new frameworks. The dominant paradigm for modeling biochemical systems through most of the twentieth century was the ordinary differential equation: write down rate equations for each molecular species, solve them numerically, read off the concentrations as a function of time. This framework is mathematically clean, computationally efficient, and completely deterministic.

For large numbers of molecules, it is also accurate. The law of large numbers guarantees that the stochastic fluctuations of a large-copy-number system will average out to behavior closely approximated by the deterministic equations. But for the small molecular populations that characterize many critical regulatory events in cells, the approximation breaks down badly. A deterministic ODE model of a genetic toggle switch, for example, will predict that the switch settles into one of two stable states and stays there indefinitely. A stochastic simulation of the same system will show spontaneous switching between states at a rate determined by the depth of the energy wells and the noise amplitude. These are qualitatively different predictions with qualitatively different biological implications.

The transition from ODE-based to stochastic models in systems biology over the 1990s and 2000s was not smooth. It required new software tools: SBML for model representation, COPASI and StochKit for stochastic simulation, BioNetGen for rule-based modeling of complex signaling networks with combinatorial complexity. It required new mathematical frameworks: the linear noise approximation for computing noise statistics analytically, moment closure methods for systems where even the SSA is too slow, and hybrid methods that apply stochastic simulation to low-copy-number species while using deterministic equations for high-copy-number species.

The shift from ODEs to stochastic simulation was not just a technical upgrade. It was a philosophical one: accepting that the variance of a biological system is as informative as its mean, and that any model that discards the variance is discarding biology.

Stochastic Approaches Today: From Single Cells to Machine Learning

Where does the history of stochastic biology stand in 2026? The field has matured substantially, but it has also bifurcated in ways that are worth noting honestly.

The Stochastic Systems Biology Community

There is now a well-developed community working on stochastic modeling of gene regulatory networks, signaling pathways, and cell fate decisions. The Gillespie algorithm and its variants, including tau-leaping for faster approximate simulation and the next reaction method for improved efficiency, are standard tools. The theoretical frameworks for analyzing stochastic dynamical systems, including the chemical Langevin equation, the system size expansion, and information-theoretic measures of noise, are mature. Groups like those of Johan Paulsson at Harvard and Arjun Bhatt at UCSF have used these tools to make precise quantitative predictions about how cells regulate and exploit noise, predictions that have been tested and confirmed experimentally.

The Single-Cell Genomics Community

In parallel, the single-cell genomics community has built a largely separate set of tools for analyzing cell-to-cell variability at the transcriptomic level. Trajectory inference, RNA velocity, pseudotime ordering, and latent variable models are all, at their core, attempts to make sense of the stochastic spread of cells through transcriptional state space. The connections between this community and the mechanistic stochastic modeling community are weaker than they should be. Single-cell tools tend to be phenomenological: they describe the distribution of cellular states without necessarily explaining the molecular mechanisms generating that distribution. Mechanistic stochastic models explain the mechanisms but are typically too computationally expensive to apply at the scale of thousands of genes and millions of cells.

The Machine Learning Interface

The most recent development, and the one that connects this history to the current moment, is the application of machine learning to stochastic biological data. Models trained on single-cell data must implicitly or explicitly handle the fact that the data generating process is stochastic. Foundation models for biology, trained on sequence or expression data at enormous scale, are learning representations of a noisy biological reality. The question of whether those representations capture the biological signal or the noise structure of the training data is not a technical footnote. It is the central epistemological question for the field.

The history of stochastic biology offers a lesson here that is worth internalizing. Every time a new experimental technology exposed a new layer of biological variability, the initial response was to treat that variability as a problem to be solved rather than a phenomenon to be modeled. Single-molecule experiments were initially seen as too noisy to be informative. Single-cell sequencing data was initially dominated by discussion of dropout artifacts and batch effects. The stochastic variation that turned out to be biological signal was indistinguishable from technical noise until the field developed both the tools and the intellectual frameworks to tell them apart.

We are in an analogous moment with AI in biology. The variance in model behavior across runs, the uncertainty in model predictions, the distributional spread of outputs across biological contexts: these are being treated in many quarters as engineering problems to be solved by better training, larger datasets, or more sophisticated architectures. Some of that variance is technical. Some of it is the model honestly reflecting the stochastic nature of the biology it was trained on. Telling the difference requires the same combination of quantitative rigor and biological judgment that has been required at every previous transition in the history of stochastic biology.

What This History Teaches

Three things stand out from a century and a half of stochastic thinking in biology, and they are as relevant now as they were when Mendel first counted his peas.

First, the tools for seeing variability consistently arrive before the frameworks for interpreting it. Mendel had stochastic outcomes from his crosses but did not have the language of probability distributions. Brown had Brownian motion but not the statistical mechanics to explain it. The molecular biologists of the 1960s had bulk measurements that averaged over stochastic cell-to-cell variation they could not yet see. Single-cell sequencing is now generating variation at a scale that the field's interpretive frameworks are still catching up to.

Second, resistance to stochastic explanations has consistently come from the same place: the appeal of simpler, deterministic accounts. Deterministic models are more tractable mathematically, more intuitive to reason about, and easier to communicate. The pressure toward deterministic explanations in biology is not intellectual laziness. It reflects a real trade-off between model tractability and biological realism. The history suggests that this trade-off is frequently resolved in favor of tractability for too long, and that the field pays a price in predictive accuracy and mechanistic understanding.

Third, the most productive advances in stochastic biology have come not from treating noise as an obstacle but from asking what the noise itself is telling you. Wright asked what the variance in allele frequencies across populations revealed about effective population size and the relative importance of drift and selection. Elowitz asked what the decomposition of gene expression noise into intrinsic and extrinsic components revealed about the architecture of gene regulatory networks. In both cases, the answer was: quite a lot, and more than the mean ever could.

That orientation, treating variance as signal rather than as noise to be filtered, is the core intellectual stance of stochastic biology. It was hard-won over more than a century of resistance and reformulation. It is the stance that the current generation of biological AI tools most urgently needs to adopt.

Key References

[01] Mendel GJ. Versuche uber Pflanzenhybriden. Verhandlungen des naturforschenden Vereines in Brunn. 1866;4:3-47. The foundational paper in genetics, which is also, in retrospect, the first probabilistic model of a biological process.

[02] Einstein A. Uber die von der molekularkinetischen Theorie der Warme geforderte Bewegung von in ruhenden Flussigkeiten suspendierten Teilchen. Annalen der Physik. 1905;17:549-560. Einstein's paper on Brownian motion, which established the mathematical framework for diffusive random processes underlying molecular movement in cells.

[03] Wright S. Evolution in Mendelian populations. Genetics. 1931;16(2):97-159. The paper that introduced genetic drift as a quantitative mechanism in population genetics, establishing randomness as a first-class driver of evolutionary change.

[04] Kimura M. Evolutionary rate at the molecular level. Nature. 1968;217:624-626. The original neutral theory paper, proposing that most molecular evolution is driven by drift rather than selection and establishing randomness as the null model for molecular evolution.

[05] Gillespie DT. Exact stochastic simulation of coupled chemical reactions. Journal of Physical Chemistry. 1977;81(25):2340-2361. The paper that introduced the stochastic simulation algorithm, making exact simulation of biochemical reaction networks computationally tractable for the first time.

[06] Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002;297(5584):1183-1186. The experimental paper that decomposed gene expression noise into intrinsic and extrinsic components in living cells, marking the inflection point in experimental stochastic biology.

[07] Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLOS Biology. 2006;4(10):e309. The single-molecule FISH paper that quantified transcriptional bursting in mammalian cells, showing mRNA count distributions are broader than Poisson and consistent with a two-state promoter model.

[08] Paulsson J. Summing up the noise in gene networks. Nature. 2004;427:415-418. A theoretical treatment of how noise propagates through gene regulatory networks, establishing that the variance of a regulatory output is as informative about network architecture as its mean.

[09] Taniguchi Y, et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010;329(5991):533-538. A landmark study providing genome-scale measurement of protein and mRNA levels in single E. coli cells, revealing extensive cell-to-cell variation and quantifying the relationship between transcriptional and translational noise.

[10] Macosko EZ, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161(5):1202-1214. A foundational single-cell RNA-seq paper demonstrating high-throughput profiling of transcriptional heterogeneity, opening the door to genome-scale characterization of stochastic cell states.

[11] Bergen V, Lange M, Peidli S, Wolf FA, Theis FJ. Generalizing RNA velocity to transient cell states through dynamical modeling. Nature Biotechnology. 2020;38:1408-1414. The RNA velocity paper, a direct application of stochastic kinetic modeling of transcription and splicing to infer the direction of cell state transitions from snapshot single-cell data.

[12] Van Kampen NG. Stochastic Processes in Physics and Chemistry. 3rd ed. Elsevier; 2007. The standard reference text for the mathematical theory of stochastic processes in the physical and biological sciences, covering the master equation, Fokker-Planck equation, Langevin equation, and their biological applications.

Blaise Manga Enuh, PhD

Computational biologist and bioinformatics engineer at the Great Lakes Bioenergy Research Center. I build ML models, bioinformatics pipelines, and scientific software tools at the intersection of microbial biology and machine learning.

Back to site Get in touch

All writing