roc icon indicating copy to clipboard operation
roc copied to clipboard

Roc performance benchmark VS julia

Open mpizenberg opened this issue 4 weeks ago • 3 comments

While doing AoC day 2 of 2025, I’ve been blocked by the performances of the interpreter, so here is a small comparative speed on the same program (logically) both in Julia, and in the Roc interpreter.

TLDR: Roc interpreter is over 2k times slower than Julia on this AoC solution.

Another remark is that the Roc code increases its time almost exponentially with the number inputs to process, while it should only be a linear increase.

Here is the Julia code, which can be run with julia --project bench_aoc_02.jl

#!/usr/bin/env -S julia --project

# 10k numbers: 200ms
# 10M numbers: 700ms
# On macos M3 with Julia 1.12.2

module AdventOfCodeDayXX
    # Compute the sum of invalid numbers in a range
    function solve(str_ranges)
        ranges = [parse.(Int, split(r, "-")) for r in split(str_ranges, ",")]
        invalids = collect(x for (a,b) in ranges for x in a:b if is_invalid(x))
        return sum(invalids)
    end

    # A number is invalid if its base 10 representation
    # is a duplicate of the first half of the digits: 123123
    function is_invalid(x)
        x_str = string(x)
        mid = length(x_str) ÷ 2
        left = x_str[1:mid]
        right = x_str[mid+1:end]
        left == right
    end

    function main()
        @show solve("67896666-77896666")
    end

    function __init__()
        if abspath(PROGRAM_FILE) == @__FILE__
            main()
        end
    end
end

Here is the Roc code, which can be run with roc bench_aoc_02.roc

app [main!] { pf: platform "https://github.com/lukewilliamboswell/roc-platform-template-zig/releases/download/0.5/BJBzo2SR2o5w3StmubGWvnPHq6hfePMaNWy5MwkPuZUs.tar.zst" }

# 10  numbers: 150ms
# 100 numbers: 400ms
# 1k  numbers: 4.8s
# 3k  numbers: 45s
# 10k numbers: 370s
# On macos M3 with Roc compiler version release-fast-d898d8be

import pf.Stdout

# Compute the sum of invalid numbers in a range
solve! : Str => Try(U64, _)
solve! = |input| {
    var $sum = 0
    for range_str in input.trim().split_on(",") {
        (start, end) = parse_range(range_str)?
        Stdout.line!("Range size: ${(end - start + 1).to_str()}")
        var $x = start
        while $x <= end {
            if is_invalid($x) {
                $sum = $sum + $x
            }
            $x = $x + 1
        }
    }
    Ok($sum)
}

# "123-654" -> (123, 654)
parse_range : Str -> Try((U64, U64), _)
parse_range = |range_str| {
    match range_str.split_on("-") {
        [a, b] => Ok((U64.from_str(a)?, U64.from_str(b)?))
        _ => Err(InvalidRangeFormat)
    }
}

# A number is invalid if its base 10 representation
# is a duplicate of the first half of the digits: 123123
is_invalid : U64 -> Bool
is_invalid = |x| {
    s = x.to_str().to_utf8()
    n = s.len()
    mid = n // 2
    left = s.sublist({ start: 0, len: mid })
    right = s.sublist({ start: mid, len: n - mid })
    left == right
}

run! = || {
    input = "67896666-67906666"
    Stdout.line!("Solution: ${solve!(input)?.to_str()}")
    Ok({})
}

main! = |_args| {
    match run!() {
        Ok(_) => Ok({})
        Err(_) => { Err(1) }
    }
}

mpizenberg avatar Dec 10 '25 18:12 mpizenberg

#8624 improves performance for this issue 20.2 -> 8.8 seconds

Using my x64 Ubuntu machine I get the following

Before (release-fast-b60aa9a3)

$ time ./test_perf 
Range size: 10001
Solution: 67896789

real	0m20.239s
user	0m14.762s
sys	0m5.470s

After (release-fast-d631ec2f)

lbw@lbw-B850-GAMING-X-WIFI6E:~/Documents/Github/roc$ ./zig-out/bin/roc build test_perf.roc 
lbw@lbw-B850-GAMING-X-WIFI6E:~/Documents/Github/roc$ time ./test_perf 
Range size: 10001
Solution: 67896789

real	0m8.834s
user	0m3.205s
sys	0m5.587s

lukewilliamboswell avatar Dec 10 '25 23:12 lukewilliamboswell

I thought I'd try poop out... this is the summary

$ poop ./app-release-fast-b60aa9a3 ./app-release-fast-d631ec2f 
Benchmark 1 (3 runs): ./app-release-fast-b60aa9a3
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          20.2s  ±  177ms    20.1s  … 20.4s           0 ( 0%)        0%
  peak_rss            403MB ± 42.6KB     403MB …  403MB          0 ( 0%)        0%
  cpu_cycles         78.7G  ± 1.07G     78.0G  … 79.9G           0 ( 0%)        0%
  instructions        342G  ± 11.5M      342G  …  342G           0 ( 0%)        0%
  cache_references   10.9G  ± 44.4M     10.8G  … 10.9G           0 ( 0%)        0%
  cache_misses        632M  ±  303M      457M  …  982M           0 ( 0%)        0%
  branch_misses       207M  ± 86.6K      207M  …  207M           0 ( 0%)        0%
Benchmark 2 (3 runs): ./app-release-fast-d631ec2f
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          8.87s  ±  116ms    8.74s  … 8.95s           0 ( 0%)        ⚡- 56.0% ±  1.7%
  peak_rss            403MB ±  231KB     403MB …  403MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles         16.8G  ±  547M     16.2G  … 17.2G           0 ( 0%)        ⚡- 78.7% ±  2.4%
  instructions       55.7G  ± 11.5M     55.7G  … 55.8G           0 ( 0%)        ⚡- 83.7% ±  0.0%
  cache_references   10.8G  ± 32.5M     10.8G  … 10.8G           0 ( 0%)          -  0.7% ±  0.8%
  cache_misses        691M  ±  180M      484M  …  816M           0 ( 0%)          +  9.3% ± 89.5%
  branch_misses      16.1M  ± 96.7K     16.0M  … 16.2M           0 ( 0%)        ⚡- 92.2% ±  0.1%

lukewilliamboswell avatar Dec 11 '25 00:12 lukewilliamboswell

I’m not sure why, but on my machine at your PR (release-fast-33400dee) I get the following:

3K: improve from 45s to 28s 10K: degrades from 370s to 380s

mpizenberg avatar Dec 11 '25 00:12 mpizenberg