Python VS R: Which One Should You Use For Data Science?
Harvard Business Review has termed data science as the sexiest job of the 21st century. Two languages dominate the data science industry: R and Python
While both have gained prominence among programmers and developers, they are at war with each other to become the language of choice for data scientists.
History of Python & R
R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories by John Chambers and colleagues. R can be considered as a different implementation of S.
Born out of the ABC language, Python’s origins date back to December 1989. Created by Guido van Rossum as a hobby project to work on during week around Christmas, Python is famously named not after the constrictor snake but rather the British comedy troupe Monty Python’s Flying Circus.
Let’s dive a little deeper to compare the two
This is by no means an exhaustive list but here are a few parameters on which we would compare the two languages and try to identify which language comes on top in which scenario.
Ease Of Learning
R has a steep learning curve and people without programming experience may find it overwhelming initially but once you get the hang of it, you can easily build on the foundation and do advanced stuff really quickly.
Python’s philosophy on simplicity and readability makes it very popular among the beginners. It is generally considered easier to pick up.
Another advantage of Python is that it is a more general programming language. For someone interested in becoming a general-purpose programmer, Python is a better choice.
R is used primarily in research and academia but it is slowly making a mark in the enterprise market. Oracle, Microsoft, and IBM are just a few of the many companies that are developing R packages for use with their existing services and databases.
It is the go-to language when data analysis tasks require standalone computing or analysis on individual servers.
Python is used by programmers and engineers from diverse backgrounds who want to delve into data analysis or apply statistical techniques that have to be integrated with web apps or if the code needs to be incorporated into a production database.
Since R is a low-level programming language, it requires long lines to code to execute the same functionality as opposed to Python.
Python being a high-level programming language is much faster and therefore has been the choice of for building mission-critical applications which can be executed quickly.
Libraries & Code Repositories
Primarily designed for statistical computing, R offers an excellent set of high-quality packages for statistical data collection and visualization.
CRAN is a huge repository of R packages to which users can easily contribute. The packages consist of R functions, data and compiled code which can be installed. It also has a long list of popular packages such as dplyr, data.table and many more.
Python is more of a general-purpose language with a rich set of libraries for a wide range of purposes. It consists of PyPi package index which is a repository of Python software. Although users can contribute to PyPi, it is a complicated process. The dependencies and installation of Python libraries can be a cumbersome process
Graphics & Visualization
You would come across a lot of data science professionals who have a penchant for R instead of Python due to its visualization libraries (both static and interactive), interactive style and advanced reporting capabilities.
R packages such as Plotly, High charter, Dygraphs, and Ggiraph are capable of providing users with interactive visualizations that will make them fall in love with this language.
Python has libraries for practically every data visualization need. While most of the libraries are highly specific in accomplishing a single task, some of generic and can be used by users from any field.
If you are looking for higher performance or structured code Python is the go-to language.
Data Handling Capabilities
R is convenient for analysis due to the high number of packages, readily usable tests and the advantage of using formulas. It is optimal for basic data analysis without the installation of packages. Big datasets require the use of packages such as data.table and dplyr.
In the initial stages, the packages available in Python were an issue but the scenario has improved with the latest releases. Numpy and Pandas are used for data analysis in Python and both these languages are suitable for parallel computing.
Both R and Python do not offer any customer service support but have online communities to offer support. Python has a much bigger community and hence you are more likely to get help if you run into any trouble.
R has a huge community with support available in the form of mailing lists, user-contributed documentation, and active StackOverflow members. Python support can be found at Stackoverflow, mailing lists, and user-contributed code and documentation.
Both languages started in the early 1990s but Python has witnessed massive growth in popularity and adoption in recent years. Moreover, Python users are more loyal to their language. The percentage of people switching from R to Python is twice as large as Python to R.
R saw a percentage change in popularity of 4.23% whereas, for Python, that number was at 21.69% in 2018.
Clearly, Python has overtaken R in terms of popularity in recent times and it looks like the trend is going to continue in 2019.
Software companies are more inclined towards emerging technologies such as Machine Learning, Artifical Intelligence and Big Data which explains the huge demand for Python engineers.
While both R and Python can be used for statistical analysis, Python has an edge due to its readability and easy to understand nature.
Python can give you skills that apply across a wide range of job roles since its applications are not just limited to data science but extend to web development, game development, and applications.
With the versatility that Python brings, you will find much more Python job openings in 2019 compared to R.
Usually, statisticians or data analysts start with R and developers with Python.
R is great for prototyping and for statistical analysis. It has a huge set of libraries which empowers its users to experiment with different statistical type analysis.
However, the syntax can be difficult to grasp at times and R is also harder to integrate into a production workflow.
Python is easier to learn language due to its focus on code readability that can easily be integrated into a production workflow. It has emerged as the de facto scripting language in recent times.
However, Python isn’t as thorough and comprehensive for statistical analysis as R as Python goes beyond data science and is widely used in applications, web development, and even game development.