Julia Language for Scientists – Thomas Bryce Kelly

Today I wanted to take a moment and introduce a programming language to my friends and colleagues. While certainly not a common topic, I do so because I believe that it is well worth your time. Rather than trying to get non-programmers into a language, I write this for the already initiated such as those who use Matlab, R or Python (amongst others) since the real benefits of learning one language rather than another comes when the tasks at hand are limited by the language itself.

For example, consider if you’re problem or task (such as a model) was required to run on a cluster. More than likely this would mean that R is not the best language for your project since multi-threading is not central to the language. Instead something like Java, Python, or Go may be better first picks. So without further ado, meet Julia.

Julia is the new-kid on the block having been around for just a few years so far, but already it is making waves in the scientific computing world by bringing together the idiomatic programming of python with the speed of C or Fortran. While still in the early releases (0.5.0 recently came out), the language has matured quite a bit in a short amount of time. Matlab users will find the transition to Julia easier than switching to Python or R (and visa-versa) while being designed as a full-bodied programing language with structures (types), methods (functions) and access to low-level control of memory and data. As a typed language, every variable has an explicit representation with memory (e.g. as a 64 bit integer or as a character or floating point number or…) which can dramatically accelerate the speed of computation. In fact, here is the type-tree of Julia:

Just like any language, the upfront cost is all about learning the semantics of the language and the typography associated with defining functions, initializing variable and the like. For example, it can take a little bit to remember how to initialize an array of ones in various languages:

# java
double[] a = new double[100];
for (int i=0; i < 100; i++) {
    a[i] = 1;
}

# python
a = [1 for i in range(100)]

# R
a = rep(1, 100)

# matlab
a = ones(100)

# julia
a = ones(100)

Clearly there is a lot of different ways to complete the task, each with their own idiosyncrasies. In Java (or C), python and R there are no “magic” commands^[1] for this sort of initialization which has some pros and cons depending on the user. I personally like no having to remember what the function name is in order to perform these simple tasks. In java whenever I go to initialize an array I have no choice but to control how and when I do it and never have to worry about remember any keywords. On the flip side, the built-in ones() function is sure to be highly optimized and efficient at what it does. So for high-level languages, it does often make the most sense, especially when looping can be costly to implement.

Moving on, I just want to give some examples of Julia code and some benchmarks for comparison between Julia and some other languages. I will also preface all of these results by saying that these are far from “scientific” tests and I make no claims about being an expert or efficient coder in ANY language.

For the first test I’ll use the 7th problem from Project Euler which asks:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10 001st prime number?

So here is my Julia code for this (note that it is not optimized or anything):

function isprime(n)
    m = convert(Int64, floor(sqrt(n)))
    for i=2:m
        if n % i == 0
            return(false)
        end
    end
    true
end

function count(t)
    i = 0
    test = 1
    while i < t
        test += 1
        if isprime(test)
            i += 1
        end
    end
    test
end

## Test
count(6)
# 13

@time count(10001)
#  0.063729 seconds (5 allocations: 176 bytes)
#Out[31]:
#104743

So the answer is 104,743 and it took under 64 milliseconds.

And here’s my python:

import math
def isprime(n):
    m = int(math.sqrt(n))
    for i in range(2,m+1):
        if n % i == 0:
            return(False)
    return(True)

def count(t):
    i = 0
    test = 1
    while i < t:
        test += 1
        if isprime(test):
            i += 1
    return(test)
## Test
count(6)
# 13

%time count(10001)
#CPU times: user 492 ms, sys: 4 ms, total: 496 ms
#Wall time: 716 ms
#Out[30]:
#104743

Python took about 7.7 times longer to solve the problem, and when scaled up to the 100,001th prime the difference expanded to 13.6x speedup for Julia (1.46s vs 19.9s). Again, this is pretty terrible code but I think it represents the sort of code the average user would throw together to try stuff out. Production code would be a bit more polished and optimized a lot more.

Here is the same code transferred to R.

isprime = function(n) {
    m = floor(sqrt(n))
    for (i in 2:(m+1)) {
        if (n %% i == 0) {
            return(0)
        }
    }
    return(1)
}

count = function(t) {
    i = 2
    test = 3
    while (i < t) {
        test = test + 1
        if (isprime(test)) {
            i = i + 1
        }
    }
    test
}

system.time(count(10001))
#   user  system elapsed 
#  2.560   0.000   2.616

While this code is terrible for R, I was able to get a 2-3x speedup by vectorizing this algorithm (still an order of magnitude slower than Julia or Python). Admittedly these are not the problems that R was designed to solve, it is still useful to see where it stands overall.

While I will continue to use R in my daily activities (data analysis), it is very useful to know that Julia is there for me when I have numerically intensive tasks to do. By blending the high-level coding of python and Matlab with the low-level performance of C or FORTRAN, Julia makes for a very interesting project for scientific computations.

At least none that I know of or have seen widely used.