Performance regression in Julia 1.12 (beta4) for heavy OLS workloads (beyond package latency)
Description
I also came across this discussion of package-loading latency in 1.12, which mostly attributes regressions to precompilation differences across Julia versions (relative to loading or downloading packages, in most cases). In my benchmarks, however, all packages are already precompiled, so loading times should be negligible—and yet the performance regression actually (slightly) grows in absolute terms with heavy tasks executions.
When running a large number of OLS estimations via GlobalSearchRegression on synthetic data, Julia 1.12 (beta) is significantly slower than both Julia 1.11 (release) and 1.9.4.
Benchmark Code
Trying to mimic the "real world scenario" for my work (e.g. calling julia codes from console) I did the following PowerShell minimal script (using juliaup to manage julia versions):
& {
$results = @()
@(15,25) | ForEach-Object {
$cov = $_
$seed = $cov + 1000
@('+1.8.5','+1.9.4','+lts','+release','+beta') | ForEach-Object {
$v = $_
$cmd = 'using Random,DataFrames,GlobalSearchRegression; ' +
'rng = MersenneTwister(' + $seed + '); ' +
'data = DataFrame(rand(rng,100,' + $cov + '), :auto); ' +
'data.y = rand(rng,100); ' +
'gsreg("y ~ x*", data)'
$times = 1..5 | ForEach-Object {
( Measure-Command { julia $v -p auto -e $cmd } ).TotalSeconds
}
$avg = ($times | Measure-Object -Average).Average
$results += [PSCustomObject]@{
JuliaVersion = $v
Covariates = $cov
AverageTime = [Math]::Round($avg, 4)
}
}
}
$results |
Where-Object { $_.Covariates -ne 20 } |
Sort-Object Covariates, JuliaVersion |
Format-Table JuliaVersion, Covariates, AverageTime -AutoSize
}
Observed results
On Windows 11 machine (16 threads, 32 GB RAM) with PowerShell 7.x, running the above script produced:
| JuliaVersion | Covariates | AverageTime (s) |
|---|---|---|
| +1.8.5 | 15 | 35.16 |
| +1.9.4 | 15 | 39.53 |
| +lts | 15 | 56.52 |
| +release | 15 | 47.59 |
| +beta | 15 | 51.09 |
| +1.8.5 | 25 | 154.37 |
| +1.9.4 | 25 | 129.27 |
| +lts | 25 | 140.17 |
| +release | 25 | 128.22 |
| +beta | 25 | 135.89 |
Julia 1.12 (beta) is consistently (5–30%) slower than Julia 1.11 (release) and 1.9.4 on both moderate (15) and heavy (25) covariate workloads.
Discussion
Given the package-loading latency discussion and the performance improvements documented in the Julia 1.12 NEWS, one would anticipate that any initial “time-to-first-x” (TTFX) penalty in 1.12 should be pregressively reverted for prolonged, compute-intensive runs. However, could it be that under heavier workloads and longer runs the absolute performance gap (in absolute terms) actually widens rather than narrows? Might this indicate that factors beyond simple TTFX are contributing to the observed regression? I understand that Julia 1.13 is currently under development; I hope these findings can help for that work.
Split up your measurements. Compare
- Julia startup time
- Package load time
- TTFX (overhead in calling the function for the first time)
- Runtime (with e.g. BenchmarkTools)
Shoving everything into a single number doesn't help much with identifying what has gotten slower.
Thanks @KristofferC, Following your advice, I've re-run the benchmarks. Here you have the scripts used to generate our new results.
PowerShell Orchestration Script:
& {
$results = @()
@(15, 24) | ForEach-Object {
$cov = $_
$seed = $cov + 1000
@('+1.8.5', '+1.9.4', '+lts', '+release', '+beta') | ForEach-Object {
$v = $_
$startupTime = (Measure-Command { julia $v -e "exit()" }).TotalSeconds
$loadCmd = 'using Random,DataFrames,GlobalSearchRegression,BenchmarkTools'
$totalLoadTime = (Measure-Command { julia $v -p auto -e $loadCmd }).TotalSeconds
$packageLoadTime = $totalLoadTime - $startupTime
$benchmarkOutput = (julia $v -p auto .\run_benchmark.jl $cov $seed)
$splitOutput = $benchmarkOutput.Split(',')
$ttfxTime = [double]$splitOutput[0]
$runtime = [double]$splitOutput[1]
$results += [PSCustomObject]@{
JuliaVersion = $v
Covariates = $cov
StartupTime = [Math]::Round($startupTime, 4)
PackageLoadTime = [Math]::Round($packageLoadTime, 4)
TTFX_Time = [Math]::Round($ttfxTime, 4)
Runtime = [Math]::Round($runtime, 4)
}
}
}
$results |
Sort-Object Covariates, JuliaVersion |
Format-Table JuliaVersion, Covariates, StartupTime, PackageLoadTime, TTFX_Time, Runtime -AutoSize
}
Julia Benchmark Script (run_benchmark.jl):
# File: run_benchmark.jl
using Random, DataFrames, GlobalSearchRegression, BenchmarkTools
function run_workload(rng, cov)
data = DataFrame(rand(rng, 100, cov), :auto)
data.y = rand(rng, 100)
# Call gsreg as a regular function
gsreg("y ~ x*", data)
end
function main()
cov = parse(Int, ARGS[1])
seed = parse(Int, ARGS[2])
rng = MersenneTwister(seed)
ttfx_measurement = @timed run_workload(rng, cov)
ttfx_time = ttfx_measurement.time
fresh_rng = MersenneTwister(seed)
benchmark_results = @benchmark run_workload($fresh_rng, $cov) samples=3 evals=1
runtime_median = median(benchmark_results.times) / 1e9
println("$(ttfx_time),$(runtime_median)")
end
main()
New detailed results
| JuliaVersion | Covariates | StartupTime | PackageLoadTime | TTFX_Time | Runtime |
|---|---|---|---|---|---|
| +1.8.5 | 15 | 0.19 | 20.65 | 13.35 | 0.14 |
| +1.9.4 | 15 | 0.20 | 13.97 | 19.32 | 0.10 |
| +lts | 15 | 0.21 | 12.33 | 32.94 | 0.08 |
| +release | 15 | 0.17 | 14.92 | 23.27 | 0.09 |
| +beta | 15 | 0.23 | 15.85 | 30.12 | 0.09 |
| +1.8.5 | 24 | 0.19 | 20.58 | 74.52 | 59.19 |
| +1.9.4 | 24 | 0.21 | 14.64 | 68.11 | 47.04 |
| +lts | 24 | 0.20 | 11.64 | 80.11 | 46.55 |
| +release | 24 | 0.19 | 15.68 | 67.22 | 44.46 |
| +beta | 24 | 0.23 | 16.43 | 74.64 | 44.32 |
Analysis and Key Observations
The detailed results show that performance regression is mainly related to TTFX (JIT time). This slowdown is evident across both a light task (15 covariates; ~32,000 OLS estimations) and a heavy task (24 covariates; ~16.8 million OLS estimations). In both scenarios, lts (v1.10) and beta (v1.12) versions show a significant TTFX penalty compared to the release (v1.11) version. (Note: for the heavy task case we used 24 covariates instead of 25 to avoid system resource errors during the BenchmarkTools test loop). Hope this data is more helpful. Thanks again for pointing me in the right direction!
- Collection and Use of Your Personal Data
@Shikho18 Sorry. I don't understand this.
- Collection and Use of Your Personal Data.