Computer Science R Programming Science Technology

The Future of R

On a day to day basis, I use the R programing language more than any other language (e.g Python, Java, Fortran, Matlab…), and there is a good reason for it: R excels at the sort of work I do. It has been extremely well suited to my modeling work, especially when going to analyze the output and in making figures, but it has also versatile enough for the computationally intensive task of running the model itself. Having said that, I need to be honest about where I see the R programing language standing now and into the future (via tirade). Included in this will be a combination of strengths, weaknesses and features that will either make or break R in the coming years.

The Good

One area where R has excelled at is the interactive nature of the language. While Python and Matlab both are interactive, I’ve only ever seen explicit use of interactive and non-interactive modes used in R. For an R script, the program behaves different based on whether or not it is run interactively. While trivial, this simply idea can help make functions, class, and packages more user friendly and versatile by making the UI easier to piece together.

But I must admit that the golden egg of r, so to speak, has to be the package management system. With a simple one line command, any package available from a repository (e.g. CRAN) can be quickly downloaded and installed without a second thought, and this is precisely what a package management system should be. Over the years Python has improve on this front with pip to add similar functionality, but R is the only language I’ve seen where the package management system is integral to the design of the language.

For comparison, Matlab is the antithesis of a well executed package management system. Not only are many “official” packages, orĀ Toolboxes, not free, the ones that are all have to be downloaded and installed from websites and forums. Frankly, it is a mess and a real pain in the ass.

The Bad

Every programming language has some conception of a Namespace, so let’s take a look at how R manages its own. A Namespace is a generic term for how a programming language manages and structures the various names of the variables, functions and other constructs within the program. For example, consider the following Python program.

def g():
    x = 10
    print(x)

x = 3
g()
print(x)

Scoping refers to where in a Namespace, or perhaps more correctly, which Namespace, an variableĀ is defined. So above, there is a variable $x$Ā inside $g()$ as well as a variable $x$ outside the function. Are they the same and will the script print $10$ and then $10$ or $10$ and then $3$? In fact, since Python has simple and intuitive scoping rules, it is easy to figure out where in the Namespace a variable or function resides. This script does output $10$ and then $3$ since the $x$ inside the function is not the same as the $x$ outside the function.

The R language unfortunately uses a different set of scoping rules–and while it may just be my own inability to shift from the set used in Python, Java, C, and others–the Namespace in R is all messed up. Not only is it hard to figure out how a variable is scoped, there is no easy way to ensure a variable is in a particular Namespace. With a rather opaque system, it makes high level programming challenging since program behaviour is reliant on how the Namespace is set up and maintained. I would really like to see this changed (perhaps in an offshoot of R).

The Ugly

Multithreading is not only a valuable commodity these days of high powered, multi-core machines, it is virtual prerequisite for scientific computations where most problems naturally scale across processes. Since R is based off of S and both emerged when multi-core machines were relegated to costly servers, neither have any ability to run on more than one thread. In fact, so much of the language is incompatible with the concept that studies looking at how to adapt R to a multithread design have come up with nill (except perhaps not using R).

Since R is good at interfacing with other languages, there are workarounds if certain algorithms are bottlenecking the pipeline since they could be written in C, Fortran, Java, Python etc without too much difficulty. Just like Python, R makes for a good ‘glue’ language to piece together the various blocks (and then analyze the results).

 

 

Back To Top