Integrating Julia and R (Studio) for microsimulation modeling: challenges and rewards

Tae Yoon (Harry) Lee, Respiratory Evaluation Sciences Program, University of British Columbia

Canada Day (July 1)

R for HTA Showcase 2021

R suffers from the two-language problem

  • Microsimulation modeling is computationally intensive.

  • "R is not necessarily an ideal language for computationally intensive processes."[1]

  • For high computational performance, C and Fortran are the gold standard.

  • Often, the core of the R code is re-written in C++ and seeminglessly integrated into R functions/packages through the Rcpp package.

  • However, low-level programming languages, such as C/C++ and Fortran, is an unfamiliar endeavor for many R programmers.

  • Is there a programming language like R that can provide high computational performance like C++?

    julia

Julia hype train

  • First appeared in 2012, Julia is an open-source, high-level dynamic programming language that does not demand sacrificng human convenience for high performance.[2,3]
  • Although it is a general purpose programming language, it is specifically designed for scientific computing (developed at MIT).
  • "It's not exactly ideal for building desktop applications or operating systems, and though you can use it for web programming, it's better suited to technical computing."[4]
  • In particular, its popularity has soared in the past 2-3 years among applied mathematicians, machine learning scientists, and computational statisticians.[5]
  • What is the common denominator? Numerical computation and iteration.

Three main comparisons

  • Unlike R but like C++, Julia is a compiled language (although there is a subtle difference).
  • Like R but unlike C++, Julia is dynamically typed, i.e., you do not need to declare types like int, float, or double.
  • Julia employs a power programming paradigm called multiple dispatch; it is used extensively. Multiple dispatch has been made available in R through the S4 class and in C++ (see YOMM2).
R Julia C++
Compilation None (interpreted) Just-In-Time (JIT) Ahead-Of-Time (AOT)
Type system Dyanmic Dynamic Static

Interpeted vs. JIT vs. AOT

  • Compilation means that the code is converted into machine code (which is the language that your computer can understand).
  • AOT compilers compile the code before the program runs it. You need to compile only once.
  • JIT compilers compile the code on the fly when the program runs it. You need to compile again if you close and re-open the program.
  • For interpreted languages, the code needs to be translated each time you run it, making it inherently slow.
  • For microsimulation modeling, compilation is a must!
In [2]:
# generate values from Uniform(0,1)
x = rand(100000)

# define a function that computes the average of a vector of numbers
function compute_mean(α)
    sum(α)/length(α)
end

# computes the wall-clock time
@time compute_mean(x);
@time compute_mean(x);
  0.014783 seconds (3.74 k allocations: 177.685 KiB)
  0.000078 seconds (1 allocation: 16 bytes)

Dynamic vs. Static [6]

  • In a static language, the type of every expression (i.e., a combination of values, function calls, variables, and operators) is defined.
  • On contrary, there is no specific type assigned to an expression in a dynamic language; the type is determined as the program executes.
  • Julia is dynamically typed.
  • Julia's compiler performs type inference. It is not necessary for every expression to have an inferrable type.
  • The compiler analyzes the code, predicts the types of expressions, and then produces efficient machine code.
  • In Julia, an expression can be annoated with a type, but the annotated type does not assert the type of the expression. It simply indicates that the expression can be used only for the annoated type.
  • However, if you do annotations, then it may help the compiler generate more efficient machine code.
In [4]:
# type annotation example
# :: is used to declare type
@show (1+1)::Int
@show (1+1.0)
@show (1+1.0)::Int
(1 + 1)::Int = 2
1 + 1.0 = 2.0
TypeError: in typeassert, expected Int64, got a value of type Float64

Stacktrace:
 [1] top-level scope at show.jl:641
 [2] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
In [5]:
# generate values from Uniform(0,1)
x = rand(100000)

# define a function that computes the average of a vector of numbers
function compute_mean(α)
    sum(α)/length(α)
end

# @time is a macro that computes the wall-clock time
@time compute_mean(x);
@time compute_mean(x);  

# You can declare the type of the input
function compute_mean_typed(α::Vector{Float64})
    sum(α)/length(α)
end
@time compute_mean_typed(x);
  0.022176 seconds (3.74 k allocations: 177.685 KiB)
  0.000087 seconds (1 allocation: 16 bytes)
  0.000070 seconds

Multiple Dispatch [7]

  • Multiple dispatch is an execution rule by which the behavior of a function is determined by the combination and count of input types of a function.
  • It is similar to function overloading in C++, but there is a subtle difference.
    • Overloading is based on the static type, and hence the method is chosen at compile-time.
    • Multiple dispatch is based on the dynamic type, meaning that the method is chosen at run-time. The actual input type might be different at run-time.
  • Its main benefits include: expressivity and code share/re-use.
  • Suppose you want to write a generic function that computes a quantile of a distribution.
  • In R, you would have to write many if-else statements that account for different distributions (qunif, qbinom, qnorm, qpois, etc).
  • In Julia, the same function quantile is used but behaves differently depending on the type of a distribution.
In [6]:
using Distributions
# 50% quantile
# "." is equivalent to broadcasting in Python or apply in R
# quantile is also used to compute the quantile of a vector of numbers
@show quantile.([Uniform(0,1),Normal(0,1),Poisson(1),rand(100)],0.5)

# quantile has 66 different methods
quantile
quantile.([Uniform(0, 1), Normal(0, 1), Poisson(1), rand(100)], 0.5) = Real[0.5, 0.0, 1, 0.45467727634503763]
Out[6]:
quantile (generic function with 66 methods)
In [7]:
# input is a vector of float
function compute_mean_multiple(α::Vector{Float64})
    sum(α)/length(α)
end

# same function name but now the input is a vector of numeric strings 
function compute_mean_multiple(α::Vector{String})
    α = parse.(Float64,α)
    sum(α)/length(α)
end

@show methods(compute_mean_multiple);

# sample from Unif(0,1)
x=rand(10)
# convert those numeric values into numeric strings
s_x = string.(x)
# check whether the outputs are equal
println(compute_mean_multiple(x) == compute_mean_multiple(s_x))
methods(compute_mean_multiple) = # 2 methods for generic function "compute_mean_multiple":
[1] compute_mean_multiple(α::Array{String,1}) in Main at In[7]:7
[2] compute_mean_multiple(α::Array{Float64,1}) in Main at In[7]:2
true

What's bad about Julia?

  • Julia is still in infancy, and hence there is lack of handy packages.
  • In annual Julia conferences, core Julia developers discuss bad things about Julia and how they are planning to address them.
  • Integrating Julia into an R package might be an issue, depending on your need.

Integrating Julia into an R package

  • There are R packages (e.g., JuliaCall) that provide an interface to Julia in R.
  • You can use those packages to wrap a Julia package into an R package.
  • However, if there is a high start-up time, it will likely give an unpleasant experience for R users. Alternatively you might want to create an executable so that user do not need to have Julia installed on their machines.
  • There exists a solution using the package called PackageCompiler.jl, but it requires knowledge of C.

My Julia journey for a microsimulation 'Whole Disease' model of asthma

  • Efficient.
  • Readability and accessibility.
  • Julia is written in Julia.
  • The package ecosystem is not as rich as R: great opportunity for learning and contributing to the ecosystem.
  • Achieving high performance can be challenging as it requires good understanding of Julia.
  • I have received great support from the community. One kindly reviewed my Julia code in-depth because he was:
    bored

Summary

  • Julia is a high-level, dynamic programming language that can provide high performance.
  • Some belive Julia will become the defacto programming language for scientific computing.
  • Julia may be a good alternative to C/C++/Fortran.
  • There are two packages in Julia for microsimulation modeling: SimJulia.jl (equivalent to the simmer package in R) and Agents.jl.

References

  1. Krijkamp EM, Alarid-Escudero F, Enns EA, Jalal HJ, Hunink MM, Pechlivanoglou P. Microsimulation modeling for health decision sciences using R: a tutorial. Medical Decision Making. 2018 Apr;38(3):400-22.
  2. Bezanson J, Karpinski S, Shah VB, Edelman A. Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145. 2012 Sep 24.
  3. Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A fresh approach to numerical computing. SIAM review. 2017;59(1):65-98.
  4. https://www.wired.com/2014/02/julia/
  5. https://towardsdatascience.com/what-you-need-to-know-about-julia-in-2021-22875f34465b
  6. https://stackoverflow.com/questions/28078089/is-julia-dynamically-typed
  7. JuliaCon 2019 | The Unreasonable Effectiveness of Multiple Dispatch | Stefan Karpinski