Of Julia and R

8 May 2020

I just tried the same code in R’s data.tables and Julia’s DataFrames, and the results are a bit surprising.

I just did a quick analysis of volatility and returns, starting with my usual program — R.

It involves a large data set, CRSP, (compressed 979M) kept in a data.table. Thinking about using Julia more, I thought this would be a good experiment.

I initially used daily data, and because the sample has 33,617,369 rows and six columns, it is quite representative of the work I do.

I’ve made comparisons with Julia before: Is Julia ready for prime time? and Which numerical computing language is best: Julia, MATLAB, Python or R? ,

In any case, I used R version 3.6.0 and Julia 1.4.1. In case anybody complains, yes, this is not the latest version of R, but it’s such a pain to upgrade it on all my systems that I never managed to do it. Besides, it should make any difference here.

Anyways, here are the 2 main calls in the two languages:

data[,list(length(RET),mean(RET),sd(RET)), keyby = list(year,PERMNO)]

by(data, [:year,:PERMNO]) do data
        DataFrame(m = mean(data.RET),s = std(data.RET),c = length(data.RET))
      end

The R code looks more readable, unusual, as Julia’s code is typically much better looking.

One core only, and Julia took 2.8 seconds, R 3.1 seconds. That did surprise me. Data tables is supposed to be quite fast. We have a benchmark site that regularly compares such data operations, finding that R’s data.table is several times faster than Julia’s data table in most cases. I can’t explain why, but I did only only use 1 core.

But, if I take the total time, including loading the data in, timed by:

time Rscript run.r
time julia run.jl

R took 11.7 seconds and Julia 29.7 seconds. The reason is, of course, it takes forever to load Julia packages.

When I google such timing results, the answer is usually it doesn’t matter because one starts Julia once. Once in the REPL, everything is fast.

Fair enough, except I have quite a bit of code that only runs in command line calls only.

Besides, waiting a third of a minute before the program has loaded is quite a pain.

In a good thing I didn’t try to plot in Julia. Not only does using Plot take a long time, plot() reliably crashed on me, so bad I had to do killall julia and then also kill the plot window.

Ouch.

But, that is not the reason I chose R for my blog Low vol strategies. No, it is merely because I used RMarkdown, which is fantastic for that sort of work. (and I use RMarkdown but not Rstudio.)

I am porting my main risk library from R to Julia, so may end up using her for regular work, like updating extremerisk.org daily, especially if the startup times improve.

© All rights reserved, Jon Danielsson, 2020