I have been thinking about the scientific method, again.
Way back in the day, the 1980’s as it happens, I was a graduate student learning the research trade. Alongside the admiration for big hair and shoulderpads, there were skills to practice, theory to imbibe, and a philosophy to master.
The research philosophy I was taught as the best and really the only way to do science was deductive reasoning. That is to arrive at a logical conclusion of why something happens from first constructing a hypothesis and then completing an examination of the possibilities that arise from testing it after a manipulation or two on replicate subjects.
This is the classical scientific method of hypothesis testing through experimentation.
Observation leads to a hypothesis of what could be happening. Then an experimental test of the hypothesis is constructed manipulating one or more factors with measurement of one or more response variables. Results are analysed for significance with statical tests involving more hypotheses that are used to interpret the results to accept or reject the original hypothesis. Then rinse and repeat until enough tests are completed to generate a theory of how nature works.
A+B=C so long as all else is equal.
Most of my early research followed this search for deterministic explanations of ecological patterns. The naive assumption was that causality could be discerned from the experiments we could conduct. The search was for ways to observe a subset of subjects manipulated in a known way, fed more high-quality food, for example, that could be compared with responses from control subjects not fed the extra food, then use the comparison to test the hypothesis.
The challenge was to find a sufficient number of subjects who could be randomly assigned to a treatment or control group. There had to be enough subjects to make the comparison meaningful and then apply a manipulation realistically linked to the hypothesis.
Not one of these are easy to do in field ecology. Just try to find 30 ecological communities of the same type similar enough to call replicates. It’s very difficult. Arguably impossible, as any given assemblage of organisms in a given place is unique, never to be replicated anywhere.
Equally challenging, if not often admitted, is what to manipulate. Suppose your hypothesis is that more species in a grassland community makes biomass production more resilient to moisture stress. A subset of the species diversity and resilience hypothesis. How do you manipulate species number? Fiddling with real communities would be hard as knowing exactly how many species they have is typically not known. So you could build communities from scratch and have some with more species planted into them than others. This is what David Tillman famously did in a series of experiments in Minnesota grasslands.
Tilman D, Wedin D & Knops J (1996) Productivity and sustainability influenced by biodiversity in grassland systems. Nature 379: 718-720
Ah, but the diligent reader has picked that this particular hypothesis was really about moisture stress that can be manipulated without playing with species number. All you need to know is how many species are present in each “replicate”, then liberal watering and the use of umbrellas during rain could generate suitable manipulations.
And so it went.
Many an attempt was made to either recreate ecology in the laboratory or find ways to manipulate it in the field but, in reality, most of it was just a bad case of physics envy. Deterministic science is very hard to do in ecology. And if you agree with the likely notion that no ecological subject can truly be replicated because every phenotype is unique, then deterministic science in ecology is impossible.
Here is a classic example.
What we thought was an elegant test of how millipedes influence nutrient uptake turned into a disaster when the roof of the glasshouse leaked in a storm. The lush patches of greenery in this picture correlated nicely with drips from the roof.
Another way to think, inductive reasoning
This quote from Steven Pinker makes the scientific method about something quite different.
“Inducing generalizable patterns from a finite set of observations is the stock in trade of the scientist”
Steven Pinker “The Better Angels of Our Nature: The Decline of Violence In History And Its Causes“
Pinker is describing inductive reasoning where conclusions are drawn from the careful examination of observations, ideally lots of them.
Information is gathered without any predetermined manipulation of the subjects using whatever measurement instruments and techniques to hand. The assumption is that within the information there are patterns that can tell us something we need to know.
Deductive reasoners laugh at this notion.
“So you found a pattern”, they will say, “but how do you know what caused it? That correlation could come about for a dozen different reasons. You can’t ever know for sure which one is true. It’s not even science.”
Inductive reasoners will reply, often a little sheepishly, that there are some things you cannot experiment on. There are no replicates of the earth to form a control group and anyway, we have likelihoods. We have repeated that correlative probability dozens of times, middle-aged men are fatter than 20 somethings and they exercise less. So there.
Whilst I struggled along as a sheep trying to squeeze my ecological research into the deductive straight jacket and fighting the truth that ecological subjects are never truly replicable, my instinct was with inductive reasoning. And it still is.
The strength in a pattern is that differences, trends, and associations can be more or less likely, they have a probability. Given big enough datasets, the causal options can be inferred with ever greater confidence.
If age and exercise correlate consistently with weight, again and again, then the correlation is ever more likely to be causal or, at the very least, chance is gone and there is a cause somewhere. Throw in some logic alongside the insight of Occam and weak inference gets a lot stronger.
What happens when we are free to use a pattern?
Rather than set up a grazing experiment, why not observe hundreds or thousands of pastures with known grazing history and site attributes (soil, weather, vegetation type and the like) for long enough for each to experience moisture stress. Heck, there are even multivariate tools that can unpack the seemingly endless correlative options.
Why not? Well, there are only so many hours in the day of a research ecologist. The ‘likelihood from observation’ philosophy needs measurements, lots of them, on many subjects.
Most importantly the subjects need to be independent. I need to weigh thousands of men of different ages, but they can’t all live in Bristol, lest I fall into the correlative trap my deductive colleagues have set for me. This means that observations need to be both numerous and structured intelligently. If the subject is a grassland, then I will need to observe hundreds of them for a long time before I can attribute any patterns to a test of the diversity hypothesis.
Until recently, this data volume problem made inductive science the weak cousin of the deductionists. This is especially true in the naturally variable world of ecology.
Enter big data
We are at the beginning of the machine learning world of information overload. In ecology, this means remote sensing of ecological attributes from ground cover to vegetation structure to moisture levels. It means internet enabled sensors that can capture data on-ground for the longest time. It also means data handling and interpretation capabilities beyond the wildest dreams of a 1980s ecologist staring at the clouds.
The finite set of observations just got a lot bigger, as did the ability to induce patterns in it.
Commerce loves this of course because information is not just power, it helps sell things to people. Big data is popular because of this but the true value to humanity of number gathering and handling at terabyte scale is that we can now carry out inductive reasoning properly. We can take finding and interpreting pattern to a whole new level.
We just have one more challenge to overcome.
The attraction of deductive reasoning is the simplicity of its logic. It aligns with mathematics and engineering and so the strong inference that experiments generate has utility. It is also easy(ish) to do. It is possible to teach grade kids how to complete an experiment from start to finish. From the observation and hypothesis formation all the way to the statistics of difference.
It is not so easy to teach probability.
Inference from patterns needs skills in likelihood, such as when to decide if all those tubby old dudes are heavier than the young bucks simply by chance. And it also needs intuition in choosing the right tests of what is probable rather than possible.
This is where we are now.
Vast amounts of ecological data are available. For example, every modern tractor has sensors that record where it is, what it’s doing, the properties of the ground beneath it, and if the rear differential needs some oil. We have satellites and sensors generating terabytes of detail.
So it’s time, time to induce those generalised patterns. Time to unleash all that pent up intuition needed for inductive reasoning and to learn some likelihood skills.
If these thoughts on the scientific method resonate, even a little, please share this post or at least the idea. It might make a small difference.
If they resonate more, please contact us, we would love to hear from you.