If you are like me wondering what is the trend of programming languages usage in Bioinformatics, I figured out a simple and intuitive method to get some stats on that issue without necessary conducting a survey for that.
Even if it is not necessary accurate and not representative, it gives though an idea on what language bioinformatician are using today, and you can link these results to the nature of tasks that needs a given language more than another.
The method is simple. One of the best social coding website is no doubt Github and Bitbucket (we hope biocodershub will be one day as popular as these two, the road is too long for that and we are too small we cannot compete with them, at least for now).
These websites offer a rich APIs that one can use to integrate into another system (website) or just use to update their accounts by updating source codes or submitting new ones.
I used Github API at its basic level (curl) to find users and source codes related to bioinformatics. The command below list source codes and repositories by keyword, the output is in JSON format, I piped it into python formatting commas and I get these results :
As you can see there is a lot of information here such us the user name, the repos description number of followers, date of submission etc …
I played with some other API commodes to extract languages used by these people, and tried to get some stats, here are the results I obtained :
Well ! Looks like Github Bioinformaticians are extensive Python usage, after looking to their repos, I found a lot of web scripting and format conversion use cases in NGS data analysis context, R is used coupled with other languages for Statistical analysis (even if the pie chart above is not really fair for R, one can further classify these results by analysis context)
Python is growing in popularity and Perl is regressing, not a surprise some of you will say, yes that’s true, to be honest I don’t know the difference if you have enough arguments for such popularity please add your comment below.
C/C++ are used for heavy tasks that needs a lot of memory allocation and calculations, Ruby is growing as well among the community.
The trend show also (stats not shown here) that more and more bioinformatics projects are about data analysis standalone tools (web portals are over ladies and gents ! ), it is completely understandable, for a long time web portal and servers were kind of “fashion” in the field, this trend is over, we have to analyze large amount of data nowadays and developers are more concerned with the amount of useful information out there more than displaying nicely results in a a browser or a database.
If you have comments, let us know and comment this post !
Category: Geeks Corner
About the AuthorRad is a Bioinformatics Research Scientist at University of Maryland. Currently his work is related to transcriptional regulation and computational epigenomics
View Author Profile