Adam P Cribbs

Computational biologist at the University of Oxford

Why are biologists afraid of statistics and how to overcome it

Written on May 2, 2020

During my time as a PhD student, a PI at the institute I worked in said he chose to become a biologist because he was awful at maths. It got me wondering how many others have felt the same way. Asking around the institute the most common response was “Statistics is hard and I dont even know how to impliment statistics properly”.

This worried me because if the people running laboratory experiments are not confident in running the correct statistical tests then how can they be confident in their results? Are all biologists are scared of maths? I personally think the majority are, but I also think this shouldnt be the case as the statistics we tend to rely on are not terribly complicated.

In my questidentify the main reasons behind why biologists think like this, I thought that I would look to see if there is actually any evidence to back up my theory. A quick pubmed search pulled up a blinding article by Tim Fawcett and Andrew Higginson from the University of Bristol, who found that other biological scientists rarely refer to scientific articles that present staistical equations. Moreover, the most maths heavy articles are referenced around 50% less often than those with little or no maths. This points to a real inherent problem of either not understanding the maths or ignoring the maths. Both are problematic as having basic statistical knowledge is an integral part of biology, especially since we are entering the era of big data and generating more complicated analysis tools.

If I think about it more, I’m not surprised about these findings because biologists are generally drawn to biology because of their fondness for experimentally driven observations and passion for observing visual patterns in data (ok you can accuse me of generalisation here). I am reminded of another quote from a previous colleague who said, “Using statistics to prove something is biologically relevant is pointless, if it looks meaningful then it probably is”. I think what he was trying to convey by saying this was his preference of effect size over the p value. However I never really questioned him about this.

My own personal experience with statistics has also be one fraught with It took me a number of years during my PhD to even understand the difference between when to use a parametric test and a non-parametric test on simple data. Ultimately I think it stemmed from a lack of experimental relevant statistical training. I remember sitting through statistical lectures during undergrad and being quite confused and bored, none of the lectures seemed to put the statistical learning in the context of biological analysis. The main reason for this was probably because most statisticians have next to zero biological background. I suppose this leads to my first point of how statistics could be better taught to students, Unless this approach is taken then statistics can seem very confusing and off-putting.

When I undertook a fellowship in computational biology a number of years back, I knew statistics would be vital for the analysis of complicated data, however I didn’t appreciate the proportion of time I would spend choosing the correct statistical approach and then implementing it. Before my fellowship, I had the preconception that my time would be split into 50% coding, 40% analysis of data and 10% statistics. However, it seems like statistics is a much larger component of my work (estimates of ~30-40% of my time). I think this points towards how important statistics has become in biology. Following the completion of the Human Genome project we are collecting data at what seems to be an exponential rate. The analysis of this data requires new statistical approaches and a new understanding of the methods that are relevant for a particular piece of data. This is becoming even more pressing with the advances taking place in machine learning. These new techniques are being applied to answer complicated biological research question, and so we can apply them more confidently.

When I started my fellowship I realised quite quickly that I needed a solid foundation in statistics to overcome my fear, which meant starting from the beginning. My aim in the first year was to undertake courses in basic statistics. I had looked at the open university’s practical modern statistics because other computational fellows had taken this course and found it really helpful. However, the year I joined the course fees were substantially increased, making this option less favourable. As an alternative I thought I would try the free Coursera online courses in basic statistics. This was a really well designed introduction to statistics and will be helpful to any biologist regardless of their nackground. However, I finished it wanting more advanced learning. Again I relied on Coursera, Bayesian Statistics: From Concept to Data Analysis and statistical inference. The main benefit of these courses was that I could learn at a pace that suited me. This gave me the confidence to learn effectively, and then apply the techniques to my own data. Starting simple is the key to demystifying statistics, it’s an uphill struggle so be persistent.

I hear a lot of people talking about why biologists are poor at maths and stats, and this is sometimes attributed to the inherent nature of biological thinking approaches being incompatible with mathematical thinking. However, I think this is largely unfounded because when you really delve into the reasons why They find it confusing because they either don’t think it’s relevant and are less likely to pay attention (it certainly is, and therefore this is a worrying sign that they don’t understand the data they have analysed) or haven’t been taught it correctly. With the rise of MOOCS (Massive Open Online Courses) and people promoting the benefits of these courses this will certainly help towards bringing biological and statistical approaches together.