This post may contain paid links to my personal recommendations that help to support the site!
You’re an established scientist in the biological domain and you’re keen on knowing how you can transition from a biologist to a data scientist.
Here’s a short answer:
Yes, biologists can transition to be data scientists. Biology is becoming an increasingly quantitative and data-heavy domain. It is a complex field where life is quantified through biomedical data science, bioinformatics, and computational biology, which allow smoother transitions from biology to data science.
When I meant that biologists are able to transition, it means that there are probably some areas of overlap between these two areas.
I know that I’m honestly no expert in any of both of these areas but I’ve had some work in each of these fields.
During my observation of work in these areas, I picked up some aspects where biology could overlap with data science.
If you’re a biologist aspiring to enter data science, do check out these 3 reasons why below!
Why Can Biologists Transition to Data Science?
1. Biological Data is Complex
Biology is very much the study of life, exploring all the different systems in your body that react and respond to each other to make you, you!
To look into these systems and study them, one requires a strong understanding of complex systems and how they can be explained through the data collected.
Data is so large in our DNA, that 455 exabytes of data are generated just from a single gram of it, according to the New Scientist.
The study of such large, intricate complexities hidden within biological data requires a strong understanding of how to properly clean, manipulate and preserve the right data for proper insight.
These are just some of the possible areas I found where a biologist can provide something that would be great when working in data science.
2. Biology is Becoming Increasingly Computational
Although biologists may seem as without any transferrable skills, there are actually several skills a biologist has that allow for a good candidate in data science.
Of course, a proper transition would not be an immediate one and the development of these transition skills might help smooth out that transition much better
To add to the previous point I mentioned above, data in biology requires a similar data manipulation toolkit as that of a data scientist.
To play around with data, biologists tend to use programming languages like Python and R for their flexibility in conducting statistical tests.
Want to know more about Python in biology?
Check out this other article I wrote on Python and Biology!
3. Scientific Thinking
What’s the main similarity between a biologist and a data scientist, you say? Based on what I’ve observed from my interactions with both sides, they’re both scientists! That means that the same scientific thinking is present in both fields of work.
For example, the scientific methodology of having an initial hypothesis and testing against that hypothesis is a common thought process in both fields.
If you’re a biologist, you already know what’s next. The data has to be collected through some means and stored somewhere.
When that’s complete you’d have to pull up these data and have a look at them for initial data exploration.
Then comes the cleaning and removal of possible outliers due to error. Lastly, all the data is put together in digestible charts and put in a report. Now, does that all sound similar to the data science pipeline?
For comparison, here’s an article I’ve found about the data science pipeline. There are many points mentioned there but I’d like to draw similarities to the flow:
Acquiring data > Cleaning data > Data exploration > Data modeling > Data interpretation
What Areas Connect Biology to Data Science?
According to this article in Science, medicine, and the future that I found, bioinformatics can be seen as the application of computational approaches to better understand biological data.
I would say that this field is still very much biology-focused in terms of results.
The only thing is that the methods used are similar to that of data science, which is to clean and process data.
You would likely be looking at large gene datasets, mining for possible biological insights that are useful in understanding how genes work.
Of course, the data is not limited to genes and can vary to even protein expression levels.
Additionally, some bioinformaticians can work on creating algorithms for bioinformatics tools in processing data.
Based on what I experienced when speaking to a few bioinformaticians, this small field is right at the intersection center of data science, biology, and computer science.
This should be a perfect transition field for a general biologist to step into data science.
You’re going to want to learn some R and Bash to get some substantial work done in this area.
Learning R for the first time?
Find out how long it takes to learn R in my other article here.
2. Biomedical Data Science
Based on this article I’ve found, this field of biomedical data science is a rather new term coined by data scientists who work on biomedical data.
Essentially, what this means is that there’s an intersection where biological and medical knowledge is combined with data science better uncover insights from data for use in biomedicine.
Much of the data would be used to drive improvements in healthcare, which separates this category from bioinformatics, which is an older field. I would say that this area is still a growing one, with more Universities expanding their departments.
For example, Harvard University started its research Department in Biomedical Data Science in 2015 to meet this demand.
At Nanyang Technological University, Singapore, there was also a recently set up Master’s Course in Biomedical Data Science to meet the same demand.
If you’re looking for a field that can still make use of your previous biomedical knowledge, here’s a field you might want to look into!
I’d recommend starting with the IBM Data Science Professional Certificate to get started!
Read my review of it here.
Or if you’re thinking of going the data analytics route, you can use these 9 smarter ways to get started!
3. Computational Biology
Computational biologists utilize mathematical models for the prediction of outcomes for a biological understanding of interactions in molecules.
You would typically see examples of use in this field for drug development and protein interactions.
Due to the quantitative nature of this field, having a strong mathematics background is essential. This is a great trait to have when approaching the field of data science when handling machine learning models.
How Can a Biologist Transition to Data Science?
1. Work on Basic Statistics and Programming Exercises
For starters, you might want to have some basic statistics and coding exercises to training your technical expertise, which will be useful in a technical interview.
Some really good places to start learning would be DataCamp and Coursera. I would recommend taking the introductory courses and going through the basic practice questions to get your foundations well.
With these two online learning sites, you should be able to find a wide variety of topics on data science. I would personally go for those in Python, R, and Basic Statistics courses.
Personally, I went to take the Google Data Analytics Professional Certificate to get the extra training in R programming. You can check out my review of that course over here.
2. Apply Computational Methods To Your Project
“For the things we have to learn before we can do them, we learn by doing them.”― Aristotle, The Nicomachean Ethics
I have always believed that learning by doing is by far one of the fastest ways to learn. Rather than just staring at boring lecture content, try applying some of that newfound knowledge to the project you’re in.
For example, if you’re doing wet-lab benchwork for your biology project, try to incorporate some programming scripts in your data analysis after the data collection is done.
You can pick up some useful languages such as R and Bash much faster through this method.
Want to know how long a data science project would take?
Read this article to find out!
3. Learn from Youtube
If you’re very much a visual learner like me or if you need someone to guide you through your very first project in bioinformatics or data science, you should be looking to YouTube for your learning content.
What I would personally recommend is the Data Professor Youtube Channel. This channel is run by a bioinformatics professor who transitioned from biology into data science and you should really check his channel out.
Here’s a video that I think would help you out in your transition to data science.
The field of data science is still rather new and the entry requirements might still be flexible at this point (2020). That means that you are definitely able to make that transition out of biology if you take the necessary training seriously and start applying data science approaches in your daily biology work. Thanks for reading!
My Favorite Learning Resources:
My Recommended Learning Platforms!
|Learning Platform||What’s Good About the Platform?|
|1||Coursera||Certificates are offered by popular learning institutes and companies like Google & IBM|
|2||DataCamp||Comes with an integrated coding platform, great for beginners!|
|3||Pluralsight||Strong focus on data skills, taught by industry experts|
|4||Stratascratch||Learn faster by doing real interview coding practices for data science|
|5||Udacity||High-quality, comprehensive courses|
My Recommended Online Courses + Books!
|1||Data Analytics||Google Data Analytics Professional Certificate||–|
|2||Data Science||IBM Data Science Professional Certificate||–|
|3||Excel||Excel Skills for Business Specialization||–|
|4||Python||Python for Everybody Specialization||Python for Data Analysis|
|5||SQL||Introduction to SQL||SQL: The Ultimate Beginners Guide: Learn SQL Today|
|6||Tableau||Data Visualization with Tableau||Practical Tableau|
|7||Power BI||Getting Started with Power BI Desktop||Beginning Microsoft Power BI|
|8||R Programming||Data Science: Foundations using R Specialization||Learning R|
|9||Data Visualization||–||Big Book of Dashboards|