This post may contain paid links to my personal recommendations that help to support the site!
If you have been speaking to different data analysts and data scientists recently, you should have come across this observation multiple times – the Python language is gaining more popularity as compared to R.
That’s why I’ve made myself sit down to put together this blog post, to share with you the possible reasons why Python has been the popular one over the past few years.
Here’s the short answer:
Python is used more than R in Data Science because of its large community of existing users and simpler syntax compared to R. Python is also a general-purpose programming language, making it more versatile. These advantages lower the barrier for entry for beginners learning Python and led to it being used more than R.
However, another question we have to ask alongside this popularity contest, is “what was the original intention for the language when it was created?”.
Knowing the design intention of the creators of these wonderful programming languages can perhaps reveal more about why one is used more than the other.
What is Python Created For?
In the PYPL Popularity of Programming Language indexes, Python is ranked 1st and R is ranked 7th worldwide in September 2020, with trends of 2.9% and 0.3% respectively, where Python comes out as the one with higher use.
1. Readability and ease of understanding
Python was created with the idea of having the code be written in a readable manner, where programmers or new users would be able to understand the code quickly.
This bump in readability reduces the time for programmers, allowing applications in Python to be built faster and more efficiently.
2. An object-oriented programming language
One other main difference that sets Python apart from R is that is an object-oriented programming language.
This means that, well, it is simply made for controlling and manipulating objects.
This focus on objects naturally places less emphasis on the functional aspects.
What were the original design intentions for R?
1. Statistical analysis
The R language was made by statisticians, for statisticians! This naturally birthed a statistical language that provided a great library of statistical tools built-in without having to get many additional packages.
This provides advantages for everyday data analysis functions by statisticians.
2. Creating statistical software packages
Having R as a statistical language unlocks the potential for statisticians to develop tools written in R to share their tools through packages.
This is especially useful among many research scientists and statistical researchers, who gain much value from open-source sharing.
What are the main applications for use of Python?
Being a general-purpose programming language, many software engineers are well aware of the language and the structure of its syntax.
This provides greater familiarity in an engineering environment, where Python can be used easily for building web apps.
2. Web scraping
A great library that is accessible within the Python environment is “Beautiful Soup”. I know, it sounds quite odd but it’s a really great tool used by many data scientists for the purpose of one thing – web scraping.
This is a life-changer for data scientists and researchers who need access to information taken from the web through an automated Python script.
3. Artificial Intelligence (AI)
Another greatly-hyped functionality that might have boosted Python’s user base over the past few years is the functionalities for AI in the Keras, Tensorflow, and Pytorch libraries. These libraries use advanced statistical learning to provide predictions for data scientists.
With such powerful resources, there should be no reason why any data scientist would not be utilizing them.
What are the main applications for use of R?
1. Statistical Research
As mentioned above, R provides a great platform for statisticians to develop reproducible statistical packages that advance machine learning methods.
Some commonly used functions used are linear regression, clustering algorithms, and deep learning.
2. Biomedical & Healthcare Research
Having a statistical focus allows R to be quickly adopted by many scientists within the biomedical and life science fields. This area of research requires the handling of vast amounts of biological data that are prone to noise and additional biological complexities.
R provided these scientists access to powerful algorithms and packages like Bioconductor, which have provided them with many biologically relevant statistical analyses.
Moreover, a branch of biomedical research lies in the healthcare analytics field. This has become a large rising trend over the past decade, where Electronic Medical Records (EMR)s have started to come into use.
As the EMRs produce more healthcare data, R has seen adoption by translational scientists and health informaticians for their cross-functionality with biologically-meaningful datasets.
3. Statistical and machine learning
R provides a large selection of packages that give its user more options when handling statistical testing and predictions.
This added statistical computing power provides many academics and research professionals with the appropriate level of control over their data analysis.
Now that I have considered both ends of the argument in terms of the design intentions as well as the applications, I hope that you can better understand how each of them is different in their own useful ways.
1. User audience size
With these two aspects considered we can start to think about how large each of their audiences caters to. For example, Python is great for engineering. This means that the large pool of software engineers available out of these would likely give Python a shot as compared to R.
R is used by statisticians and scientists. As useful as it is for those within the academic and statistical field, they are of a smaller amount as compared to the software engineers out there.
2. Incentive to learn
There’s a tendency for most humans where you are more likely to choose what’s familiar to them. This can be used to explain why many may choose to start with Python for their data science journey. Engineers familiar with the object-oriented Python would much more likely pick it.
Being highly transferrable to their work in engineering, Python can be integrated more with their existing systems.
Additionally, the simple syntax for Python is an instant win-over for first-time learners of programming.
Although this may lead to more starting out with Python, they may just pick it up for a shorter period and may cause the number of Python users to be more than those of R.
The reasons for use of Python over R for most of us are many; ease of use, similarity to engineering systems, and object-oriented programming. However, considering the original purposes of the languages, I found that each would fill out a specific function within the data science umbrella.
The functionalities that Python provides are cover wide but many and R provides narrow but less.
Therefore, these specific niches filled out by the respective languages result in an imbalance in the popularity of Python.
My Favorite Learning Resources:
My Recommended Learning Platforms!
|Learning Platform||What’s Good About the Platform?|
|1||Coursera||Certificates are offered by popular learning institutes and companies like Google & IBM|
|2||DataCamp||Comes with an integrated coding platform, great for beginners!|
|3||Pluralsight||Strong focus on data skills, taught by industry experts|
|4||Stratascratch||Learn faster by doing real interview coding practices for data science|
|5||Udacity||High-quality, comprehensive courses|
My Recommended Online Courses + Books!
|1||Data Analytics||Google Data Analytics Professional Certificate||–|
|2||Data Science||IBM Data Science Professional Certificate||–|
|3||Excel||Excel Skills for Business Specialization||–|
|4||Python||Python for Everybody Specialization||Python for Data Analysis|
|5||SQL||Introduction to SQL||SQL: The Ultimate Beginners Guide: Learn SQL Today|
|6||Tableau||Data Visualization with Tableau||Practical Tableau|
|7||Power BI||Getting Started with Power BI Desktop||Beginning Microsoft Power BI|
|8||R Programming||Data Science: Foundations using R Specialization||Learning R|
|9||Data Visualization||–||Big Book of Dashboards|