Integrating Julia and R (Studio) for microsimulation modeling: challenges and rewards¶

Tae Yoon (Harry) Lee, Respiratory Evaluation Sciences Program, University of British Columbia¶

Canada Day (July 1)¶

R for HTA Showcase 2021¶

R suffers from the two-language problem¶

Microsimulation modeling is computationally intensive.
"R is not necessarily an ideal language for computationally intensive processes."[1]
For high computational performance, C and Fortran are the gold standard.
Often, the core of the R code is re-written in C++ and seeminglessly integrated into R functions/packages through the Rcpp package.
However, low-level programming languages, such as C/C++ and Fortran, is an unfamiliar endeavor for many R programmers.
Is there a programming language like R that can provide high computational performance like C++?

Julia hype train¶

First appeared in 2012, Julia is an open-source, high-level dynamic programming language that does not demand sacrificng human convenience for high performance.[2,3]
Although it is a general purpose programming language, it is specifically designed for scientific computing (developed at MIT).
"It's not exactly ideal for building desktop applications or operating systems, and though you can use it for web programming, it's better suited to technical computing."[4]
In particular, its popularity has soared in the past 2-3 years among applied mathematicians, machine learning scientists, and computational statisticians.[5]
What is the common denominator? Numerical computation and iteration.

Three main comparisons¶

Unlike R but like C++, Julia is a compiled language (although there is a subtle difference).
Like R but unlike C++, Julia is dynamically typed, i.e., you do not need to declare types like int, float, or double.
Julia employs a power programming paradigm called multiple dispatch; it is used extensively. Multiple dispatch has been made available in R through the S4 class and in C++ (see YOMM2).

	R	Julia	C++
Compilation	None (interpreted)	Just-In-Time (JIT)	Ahead-Of-Time (AOT)
Type system	Dyanmic	Dynamic	Static

Interpeted vs. JIT vs. AOT¶

Compilation means that the code is converted into machine code (which is the language that your computer can understand).
AOT compilers compile the code before the program runs it. You need to compile only once.
JIT compilers compile the code on the fly when the program runs it. You need to compile again if you close and re-open the program.
For interpreted languages, the code needs to be translated each time you run it, making it inherently slow.
For microsimulation modeling, compilation is a must!

In [2]:

# generate values from Uniform(0,1)
x = rand(100000)

# define a function that computes the average of a vector of numbers
function compute_mean(α)
    sum(α)/length(α)
end

# computes the wall-clock time
@time compute_mean(x);
@time compute_mean(x);

  0.014783 seconds (3.74 k allocations: 177.685 KiB)
  0.000078 seconds (1 allocation: 16 bytes)

Dynamic vs. Static [6]¶

In a static language, the type of every expression (i.e., a combination of values, function calls, variables, and operators) is defined.
On contrary, there is no specific type assigned to an expression in a dynamic language; the type is determined as the program executes.
Julia is dynamically typed.
Julia's compiler performs type inference. It is not necessary for every expression to have an inferrable type.
The compiler analyzes the code, predicts the types of expressions, and then produces efficient machine code.
In Julia, an expression can be annoated with a type, but the annotated type does not assert the type of the expression. It simply indicates that the expression can be used only for the annoated type.
However, if you do annotations, then it may help the compiler generate more efficient machine code.

In [4]:

# type annotation example
# :: is used to declare type
@show (1+1)::Int
@show (1+1.0)
@show (1+1.0)::Int

(1 + 1)::Int = 2
1 + 1.0 = 2.0

TypeError: in typeassert, expected Int64, got a value of type Float64

Stacktrace:
 [1] top-level scope at show.jl:641
 [2] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091

In [5]:

# generate values from Uniform(0,1)
x = rand(100000)

# define a function that computes the average of a vector of numbers
function compute_mean(α)
    sum(α)/length(α)
end

# @time is a macro that computes the wall-clock time
@time compute_mean(x);
@time compute_mean(x);  

# You can declare the type of the input
function compute_mean_typed(α::Vector{Float64})
    sum(α)/length(α)
end
@time compute_mean_typed(x);

  0.022176 seconds (3.74 k allocations: 177.685 KiB)
  0.000087 seconds (1 allocation: 16 bytes)
  0.000070 seconds

Multiple Dispatch [7]¶

Multiple dispatch is an execution rule by which the behavior of a function is determined by the combination and count of input types of a function.
It is similar to function overloading in C++, but there is a subtle difference.
- Overloading is based on the static type, and hence the method is chosen at compile-time.
- Multiple dispatch is based on the dynamic type, meaning that the method is chosen at run-time. The actual input type might be different at run-time.
Its main benefits include: expressivity and code share/re-use.
Suppose you want to write a generic function that computes a quantile of a distribution.
In R, you would have to write many if-else statements that account for different distributions (qunif, qbinom, qnorm, qpois, etc).
In Julia, the same function quantile is used but behaves differently depending on the type of a distribution.

In [6]:

using Distributions
# 50% quantile
# "." is equivalent to broadcasting in Python or apply in R
# quantile is also used to compute the quantile of a vector of numbers
@show quantile.([Uniform(0,1),Normal(0,1),Poisson(1),rand(100)],0.5)

# quantile has 66 different methods
quantile

quantile.([Uniform(0, 1), Normal(0, 1), Poisson(1), rand(100)], 0.5) = Real[0.5, 0.0, 1, 0.45467727634503763]

Out[6]:

quantile (generic function with 66 methods)

In [7]:

# input is a vector of float
function compute_mean_multiple(α::Vector{Float64})
    sum(α)/length(α)
end

# same function name but now the input is a vector of numeric strings 
function compute_mean_multiple(α::Vector{String})
    α = parse.(Float64,α)
    sum(α)/length(α)
end

@show methods(compute_mean_multiple);

# sample from Unif(0,1)
x=rand(10)
# convert those numeric values into numeric strings
s_x = string.(x)
# check whether the outputs are equal
println(compute_mean_multiple(x) == compute_mean_multiple(s_x))

methods(compute_mean_multiple) = # 2 methods for generic function "compute_mean_multiple":
[1] compute_mean_multiple(α::Array{String,1}) in Main at In[7]:7
[2] compute_mean_multiple(α::Array{Float64,1}) in Main at In[7]:2
true

What's bad about Julia?¶

Julia is still in infancy, and hence there is lack of handy packages.
In annual Julia conferences, core Julia developers discuss bad things about Julia and how they are planning to address them.
Integrating Julia into an R package might be an issue, depending on your need.

Integrating Julia into an R package¶

There are R packages (e.g., JuliaCall) that provide an interface to Julia in R.
You can use those packages to wrap a Julia package into an R package.
However, if there is a high start-up time, it will likely give an unpleasant experience for R users. Alternatively you might want to create an executable so that user do not need to have Julia installed on their machines.
There exists a solution using the package called PackageCompiler.jl, but it requires knowledge of C.

My Julia journey for a microsimulation 'Whole Disease' model of asthma¶

Efficient.
Readability and accessibility.
Julia is written in Julia.
The package ecosystem is not as rich as R: great opportunity for learning and contributing to the ecosystem.
Achieving high performance can be challenging as it requires good understanding of Julia.
I have received great support from the community. One kindly reviewed my Julia code in-depth because he was:

Summary¶

Julia is a high-level, dynamic programming language that can provide high performance.
Some belive Julia will become the defacto programming language for scientific computing.
Julia may be a good alternative to C/C++/Fortran.
There are two packages in Julia for microsimulation modeling: SimJulia.jl (equivalent to the simmer package in R) and Agents.jl.

References¶

Krijkamp EM, Alarid-Escudero F, Enns EA, Jalal HJ, Hunink MM, Pechlivanoglou P. Microsimulation modeling for health decision sciences using R: a tutorial. Medical Decision Making. 2018 Apr;38(3):400-22.
Bezanson J, Karpinski S, Shah VB, Edelman A. Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145. 2012 Sep 24.
Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM review. 2017;59(1):65-98.
https://www.wired.com/2014/02/julia/
https://towardsdatascience.com/what-you-need-to-know-about-julia-in-2021-22875f34465b
https://stackoverflow.com/questions/28078089/is-julia-dynamically-typed
JuliaCon 2019 | The Unreasonable Effectiveness of Multiple Dispatch | Stefan Karpinski