R suffers from the two-language problem¶

• Microsimulation modeling is computationally intensive.

• "R is not necessarily an ideal language for computationally intensive processes."[1]

• For high computational performance, C and Fortran are the gold standard.

• Often, the core of the R code is re-written in C++ and seeminglessly integrated into R functions/packages through the Rcpp package.

• However, low-level programming languages, such as C/C++ and Fortran, is an unfamiliar endeavor for many R programmers.

• Is there a programming language like R that can provide high computational performance like C++?

Julia hype train¶

• First appeared in 2012, Julia is an open-source, high-level dynamic programming language that does not demand sacrificng human convenience for high performance.[2,3]
• Although it is a general purpose programming language, it is specifically designed for scientific computing (developed at MIT).
• "It's not exactly ideal for building desktop applications or operating systems, and though you can use it for web programming, it's better suited to technical computing."[4]
• In particular, its popularity has soared in the past 2-3 years among applied mathematicians, machine learning scientists, and computational statisticians.[5]
• What is the common denominator? Numerical computation and iteration.

Three main comparisons¶

• Unlike R but like C++, Julia is a compiled language (although there is a subtle difference).
• Like R but unlike C++, Julia is dynamically typed, i.e., you do not need to declare types like int, float, or double.
• Julia employs a power programming paradigm called multiple dispatch; it is used extensively. Multiple dispatch has been made available in R through the S4 class and in C++ (see YOMM2).
R Julia C++
Compilation None (interpreted) Just-In-Time (JIT) Ahead-Of-Time (AOT)
Type system Dyanmic Dynamic Static

Interpeted vs. JIT vs. AOT¶

• Compilation means that the code is converted into machine code (which is the language that your computer can understand).
• AOT compilers compile the code before the program runs it. You need to compile only once.
• JIT compilers compile the code on the fly when the program runs it. You need to compile again if you close and re-open the program.
• For interpreted languages, the code needs to be translated each time you run it, making it inherently slow.
• For microsimulation modeling, compilation is a must!
In [2]:
# generate values from Uniform(0,1)
x = rand(100000)

# define a function that computes the average of a vector of numbers
function compute_mean(α)
sum(α)/length(α)
end

# computes the wall-clock time
@time compute_mean(x);
@time compute_mean(x);

  0.014783 seconds (3.74 k allocations: 177.685 KiB)
0.000078 seconds (1 allocation: 16 bytes)


Dynamic vs. Static [6]¶

• In a static language, the type of every expression (i.e., a combination of values, function calls, variables, and operators) is defined.
• On contrary, there is no specific type assigned to an expression in a dynamic language; the type is determined as the program executes.
• Julia is dynamically typed.
• Julia's compiler performs type inference. It is not necessary for every expression to have an inferrable type.
• The compiler analyzes the code, predicts the types of expressions, and then produces efficient machine code.
• In Julia, an expression can be annoated with a type, but the annotated type does not assert the type of the expression. It simply indicates that the expression can be used only for the annoated type.
• However, if you do annotations, then it may help the compiler generate more efficient machine code.
In [4]:
# type annotation example
# :: is used to declare type
@show (1+1)::Int
@show (1+1.0)
@show (1+1.0)::Int

(1 + 1)::Int = 2
1 + 1.0 = 2.0

TypeError: in typeassert, expected Int64, got a value of type Float64

Stacktrace:
[1] top-level scope at show.jl:641
[2] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
In [5]:
# generate values from Uniform(0,1)
x = rand(100000)

# define a function that computes the average of a vector of numbers
function compute_mean(α)
sum(α)/length(α)
end

# @time is a macro that computes the wall-clock time
@time compute_mean(x);
@time compute_mean(x);

# You can declare the type of the input
function compute_mean_typed(α::Vector{Float64})
sum(α)/length(α)
end
@time compute_mean_typed(x);

  0.022176 seconds (3.74 k allocations: 177.685 KiB)
0.000087 seconds (1 allocation: 16 bytes)
0.000070 seconds


Multiple Dispatch [7]¶

• Multiple dispatch is an execution rule by which the behavior of a function is determined by the combination and count of input types of a function.
• It is similar to function overloading in C++, but there is a subtle difference.
• Overloading is based on the static type, and hence the method is chosen at compile-time.
• Multiple dispatch is based on the dynamic type, meaning that the method is chosen at run-time. The actual input type might be different at run-time.
• Its main benefits include: expressivity and code share/re-use.
• Suppose you want to write a generic function that computes a quantile of a distribution.
• In R, you would have to write many if-else statements that account for different distributions (qunif, qbinom, qnorm, qpois, etc).
• In Julia, the same function quantile is used but behaves differently depending on the type of a distribution.
In [6]:
using Distributions
# 50% quantile
# "." is equivalent to broadcasting in Python or apply in R
# quantile is also used to compute the quantile of a vector of numbers
@show quantile.([Uniform(0,1),Normal(0,1),Poisson(1),rand(100)],0.5)

# quantile has 66 different methods
quantile

quantile.([Uniform(0, 1), Normal(0, 1), Poisson(1), rand(100)], 0.5) = Real[0.5, 0.0, 1, 0.45467727634503763]

Out[6]:
quantile (generic function with 66 methods)
In [7]:
# input is a vector of float
function compute_mean_multiple(α::Vector{Float64})
sum(α)/length(α)
end

# same function name but now the input is a vector of numeric strings
function compute_mean_multiple(α::Vector{String})
α = parse.(Float64,α)
sum(α)/length(α)
end

@show methods(compute_mean_multiple);

# sample from Unif(0,1)
x=rand(10)
# convert those numeric values into numeric strings
s_x = string.(x)
# check whether the outputs are equal
println(compute_mean_multiple(x) == compute_mean_multiple(s_x))

methods(compute_mean_multiple) = # 2 methods for generic function "compute_mean_multiple":
[1] compute_mean_multiple(α::Array{String,1}) in Main at In[7]:7
[2] compute_mean_multiple(α::Array{Float64,1}) in Main at In[7]:2
true


• Julia is still in infancy, and hence there is lack of handy packages.
• In annual Julia conferences, core Julia developers discuss bad things about Julia and how they are planning to address them.
• Integrating Julia into an R package might be an issue, depending on your need.

Integrating Julia into an R package¶

• There are R packages (e.g., JuliaCall) that provide an interface to Julia in R.
• You can use those packages to wrap a Julia package into an R package.
• However, if there is a high start-up time, it will likely give an unpleasant experience for R users. Alternatively you might want to create an executable so that user do not need to have Julia installed on their machines.
• There exists a solution using the package called PackageCompiler.jl, but it requires knowledge of C.

My Julia journey for a microsimulation 'Whole Disease' model of asthma¶

• Efficient.
• Julia is written in Julia.
• The package ecosystem is not as rich as R: great opportunity for learning and contributing to the ecosystem.
• Achieving high performance can be challenging as it requires good understanding of Julia.
• I have received great support from the community. One kindly reviewed my Julia code in-depth because he was:

Summary¶

• Julia is a high-level, dynamic programming language that can provide high performance.
• Some belive Julia will become the defacto programming language for scientific computing.
• Julia may be a good alternative to C/C++/Fortran.
• There are two packages in Julia for microsimulation modeling: SimJulia.jl (equivalent to the simmer package in R) and Agents.jl.

References¶

1. Krijkamp EM, Alarid-Escudero F, Enns EA, Jalal HJ, Hunink MM, Pechlivanoglou P. Microsimulation modeling for health decision sciences using R: a tutorial. Medical Decision Making. 2018 Apr;38(3):400-22.
2. Bezanson J, Karpinski S, Shah VB, Edelman A. Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145. 2012 Sep 24.
3. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM review. 2017;59(1):65-98.
4. https://www.wired.com/2014/02/julia/