Roc performance benchmark VS julia
While doing AoC day 2 of 2025, I’ve been blocked by the performances of the interpreter, so here is a small comparative speed on the same program (logically) both in Julia, and in the Roc interpreter.
TLDR: Roc interpreter is over 2k times slower than Julia on this AoC solution.
Another remark is that the Roc code increases its time almost exponentially with the number inputs to process, while it should only be a linear increase.
Here is the Julia code, which can be run with julia --project bench_aoc_02.jl
#!/usr/bin/env -S julia --project
# 10k numbers: 200ms
# 10M numbers: 700ms
# On macos M3 with Julia 1.12.2
module AdventOfCodeDayXX
# Compute the sum of invalid numbers in a range
function solve(str_ranges)
ranges = [parse.(Int, split(r, "-")) for r in split(str_ranges, ",")]
invalids = collect(x for (a,b) in ranges for x in a:b if is_invalid(x))
return sum(invalids)
end
# A number is invalid if its base 10 representation
# is a duplicate of the first half of the digits: 123123
function is_invalid(x)
x_str = string(x)
mid = length(x_str) ÷ 2
left = x_str[1:mid]
right = x_str[mid+1:end]
left == right
end
function main()
@show solve("67896666-77896666")
end
function __init__()
if abspath(PROGRAM_FILE) == @__FILE__
main()
end
end
end
Here is the Roc code, which can be run with roc bench_aoc_02.roc
app [main!] { pf: platform "https://github.com/lukewilliamboswell/roc-platform-template-zig/releases/download/0.5/BJBzo2SR2o5w3StmubGWvnPHq6hfePMaNWy5MwkPuZUs.tar.zst" }
# 10 numbers: 150ms
# 100 numbers: 400ms
# 1k numbers: 4.8s
# 3k numbers: 45s
# 10k numbers: 370s
# On macos M3 with Roc compiler version release-fast-d898d8be
import pf.Stdout
# Compute the sum of invalid numbers in a range
solve! : Str => Try(U64, _)
solve! = |input| {
var $sum = 0
for range_str in input.trim().split_on(",") {
(start, end) = parse_range(range_str)?
Stdout.line!("Range size: ${(end - start + 1).to_str()}")
var $x = start
while $x <= end {
if is_invalid($x) {
$sum = $sum + $x
}
$x = $x + 1
}
}
Ok($sum)
}
# "123-654" -> (123, 654)
parse_range : Str -> Try((U64, U64), _)
parse_range = |range_str| {
match range_str.split_on("-") {
[a, b] => Ok((U64.from_str(a)?, U64.from_str(b)?))
_ => Err(InvalidRangeFormat)
}
}
# A number is invalid if its base 10 representation
# is a duplicate of the first half of the digits: 123123
is_invalid : U64 -> Bool
is_invalid = |x| {
s = x.to_str().to_utf8()
n = s.len()
mid = n // 2
left = s.sublist({ start: 0, len: mid })
right = s.sublist({ start: mid, len: n - mid })
left == right
}
run! = || {
input = "67896666-67906666"
Stdout.line!("Solution: ${solve!(input)?.to_str()}")
Ok({})
}
main! = |_args| {
match run!() {
Ok(_) => Ok({})
Err(_) => { Err(1) }
}
}
#8624 improves performance for this issue 20.2 -> 8.8 seconds
Using my x64 Ubuntu machine I get the following
Before (release-fast-b60aa9a3)
$ time ./test_perf
Range size: 10001
Solution: 67896789
real 0m20.239s
user 0m14.762s
sys 0m5.470s
After (release-fast-d631ec2f)
lbw@lbw-B850-GAMING-X-WIFI6E:~/Documents/Github/roc$ ./zig-out/bin/roc build test_perf.roc
lbw@lbw-B850-GAMING-X-WIFI6E:~/Documents/Github/roc$ time ./test_perf
Range size: 10001
Solution: 67896789
real 0m8.834s
user 0m3.205s
sys 0m5.587s
I thought I'd try poop out... this is the summary
$ poop ./app-release-fast-b60aa9a3 ./app-release-fast-d631ec2f
Benchmark 1 (3 runs): ./app-release-fast-b60aa9a3
measurement mean ± σ min … max outliers delta
wall_time 20.2s ± 177ms 20.1s … 20.4s 0 ( 0%) 0%
peak_rss 403MB ± 42.6KB 403MB … 403MB 0 ( 0%) 0%
cpu_cycles 78.7G ± 1.07G 78.0G … 79.9G 0 ( 0%) 0%
instructions 342G ± 11.5M 342G … 342G 0 ( 0%) 0%
cache_references 10.9G ± 44.4M 10.8G … 10.9G 0 ( 0%) 0%
cache_misses 632M ± 303M 457M … 982M 0 ( 0%) 0%
branch_misses 207M ± 86.6K 207M … 207M 0 ( 0%) 0%
Benchmark 2 (3 runs): ./app-release-fast-d631ec2f
measurement mean ± σ min … max outliers delta
wall_time 8.87s ± 116ms 8.74s … 8.95s 0 ( 0%) ⚡- 56.0% ± 1.7%
peak_rss 403MB ± 231KB 403MB … 403MB 0 ( 0%) - 0.1% ± 0.1%
cpu_cycles 16.8G ± 547M 16.2G … 17.2G 0 ( 0%) ⚡- 78.7% ± 2.4%
instructions 55.7G ± 11.5M 55.7G … 55.8G 0 ( 0%) ⚡- 83.7% ± 0.0%
cache_references 10.8G ± 32.5M 10.8G … 10.8G 0 ( 0%) - 0.7% ± 0.8%
cache_misses 691M ± 180M 484M … 816M 0 ( 0%) + 9.3% ± 89.5%
branch_misses 16.1M ± 96.7K 16.0M … 16.2M 0 ( 0%) ⚡- 92.2% ± 0.1%
I’m not sure why, but on my machine at your PR (release-fast-33400dee) I get the following:
3K: improve from 45s to 28s 10K: degrades from 370s to 380s