rbytecode icon indicating copy to clipboard operation
rbytecode copied to clipboard

R bytecode assembler/disassembler

rbytecode

Modern R now executes most code in a stack-based virtual machine (VM).

The R code you write is first parsed to an abstract syntax tree, which is then compiled to bytecode, and this bytecode is then executed in R’s bytecode VM.

{rbytecode} provides an assembler and disassembler for R bytecode - allowing you to peek under the hood at what R is doing to execute your code.

The disassembler is a way of inspecting R code internals by dissecting existing code.

The assembler is a way of compiling R bytecode assembly directly into a bytecode object i.e. writing directly in the virtual machine language understood internally by R’s bytecode VM.

This work is heavily based around the {compiler} package (built-in to R) and Luke Tierney’s document A Byte Code Compiler for R

This package has a book!

I have written a book to accompany this package. The book gives some background on R’s execution of code in its stack-based virtual machine and a reference for the different bytecode instructions understood by this VM.

Available from:

  • LeanPub EPUB available for free - but if you felt like showing support there is an option for a small payment.
  • Online html

Ideas for the Future

  • Add support for SWITCH instruction
  • Keep track of stack size during assembly to try and catch bad code.
  • Write a small VM - maybe only covering a subset of instructions.

What’s in the box

  • dis() disassembles R language objects to bytecode assembly
  • disq() a helper function where disq(1 + x) is equivalent to dis(quote(1 + x))
  • asm() an assembler for R bytecode. Takes R bytecode assembly and outputs an executable bytecode object

Installation

You can install from GitHub with:

# install.package('remotes')
remotes::install_github('coolbutuseless/rbytecode')

Disassembler

library(rbytecode)

disq(x + 1) |> 
  as.character()
GETVAR x
LDCONST 1
ADD 
RETURN 
disq(function(x, y = 1) {x + y}) |> 
  as.character()
MAKECLOSURE x; y = 1
  GETVAR x
  GETVAR y
  ADD 
  RETURN 
ENDMAKECLOSURE 
RETURN 

Assembler - simple example

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Create bytecode assembly
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
code <- r"(
LDCONST 1
LDCONST 2
ADD
RETURN
)"

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Assemble the code into a bytecode object
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bc <- asm(code)
bc
<bytecode: 0x1307dd978>
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Evaluate the bytecode
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
eval(bc)
[1] 3

Bytecode Assembly - Fibonacci

The following bytecode assembly is a reimplementation of R code for calculating the 11th Fibonacci number.

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# R implementation
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
i   <- 0
fn2 <- 0
fn1 <- 1
while (i < 10) {
  fnnew <- fn1 + fn2
  fn2   <- fn1
  fn1   <- fnnew
  i     <- i + 1
}
fn1
[1] 89
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Bytecode implementation
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
code <- r"(
LDCONST 0L
SETVAR i
SETVAR fn2
LDCONST 1L
SETVAR fn1
@start 
GETVAR i
LDCONST 10L
LT 
BRIFNOT @end
GETVAR fn1
GETVAR fn2
ADD 
SETVAR fnnew
POP 
GETVAR fn1
SETVAR fn2
POP 
GETVAR fnnew
SETVAR fn1
POP 
GETVAR i
LDCONST 1L
ADD 
SETVAR i
POP 
GOTO @start
@end 
GETVAR fn1
RETURN 
)"

asm(code) |> eval()
[1] 89

Related Software

  • The {compiler} package. One of the base R packages.

Acknowledgements

  • R Core for developing and maintaining the language.
  • CRAN maintainers, for patiently shepherding packages onto CRAN and maintaining the repository