julia Performance regression in Julia 1.12 (beta4) for heavy OLS workloads (beyond package latency)

Description

I also came across this discussion of package-loading latency in 1.12, which mostly attributes regressions to precompilation differences across Julia versions (relative to loading or downloading packages, in most cases). In my benchmarks, however, all packages are already precompiled, so loading times should be negligible—and yet the performance regression actually (slightly) grows in absolute terms with heavy tasks executions.

When running a large number of OLS estimations via GlobalSearchRegression on synthetic data, Julia 1.12 (beta) is significantly slower than both Julia 1.11 (release) and 1.9.4.

Benchmark Code

Trying to mimic the "real world scenario" for my work (e.g. calling julia codes from console) I did the following PowerShell minimal script (using juliaup to manage julia versions):

& {
  $results = @()

  @(15,25) | ForEach-Object {
    $cov  = $_
    $seed = $cov + 1000

    @('+1.8.5','+1.9.4','+lts','+release','+beta') | ForEach-Object {
      $v = $_

      $cmd =  'using Random,DataFrames,GlobalSearchRegression; ' +
              'rng = MersenneTwister(' + $seed + '); '      +
              'data = DataFrame(rand(rng,100,' + $cov + '), :auto); ' +
              'data.y = rand(rng,100); '                   +
              'gsreg("y ~ x*", data)'

      $times = 1..5 | ForEach-Object {
        ( Measure-Command { julia $v -p auto -e $cmd } ).TotalSeconds
      }
      $avg = ($times | Measure-Object -Average).Average

      $results += [PSCustomObject]@{
        JuliaVersion = $v
        Covariates   = $cov
        AverageTime  = [Math]::Round($avg, 4)
      }
    }
  }

  $results |
    Where-Object { $_.Covariates -ne 20 } |
    Sort-Object Covariates, JuliaVersion |
    Format-Table JuliaVersion, Covariates, AverageTime -AutoSize
}

Observed results

On Windows 11 machine (16 threads, 32 GB RAM) with PowerShell 7.x, running the above script produced:

JuliaVersion	Covariates	AverageTime (s)
+1.8.5	15	35.16
+1.9.4	15	39.53
+lts	15	56.52
+release	15	47.59
+beta	15	51.09
+1.8.5	25	154.37
+1.9.4	25	129.27
+lts	25	140.17
+release	25	128.22
+beta	25	135.89

Julia 1.12 (beta) is consistently (5–30%) slower than Julia 1.11 (release) and 1.9.4 on both moderate (15) and heavy (25) covariate workloads.

Discussion

Given the package-loading latency discussion and the performance improvements documented in the Julia 1.12 NEWS, one would anticipate that any initial “time-to-first-x” (TTFX) penalty in 1.12 should be pregressively reverted for prolonged, compute-intensive runs. However, could it be that under heavier workloads and longer runs the absolute performance gap (in absolute terms) actually widens rather than narrows? Might this indicate that factors beyond simple TTFX are contributing to the observed regression? I understand that Julia 1.13 is currently under development; I hope these findings can help for that work.

Jun 11 '25 14:06 dpanigo

Split up your measurements. Compare

Julia startup time
Package load time
TTFX (overhead in calling the function for the first time)
Runtime (with e.g. BenchmarkTools)

Shoving everything into a single number doesn't help much with identifying what has gotten slower.

Jun 11 '25 14:06 KristofferC

Thanks @KristofferC, Following your advice, I've re-run the benchmarks. Here you have the scripts used to generate our new results.

PowerShell Orchestration Script:

& {
  $results = @()
  @(15, 24) | ForEach-Object {
    $cov  = $_
    $seed = $cov + 1000
    @('+1.8.5', '+1.9.4', '+lts', '+release', '+beta') | ForEach-Object {
      $v = $_
      $startupTime = (Measure-Command { julia $v -e "exit()" }).TotalSeconds
      
      $loadCmd = 'using Random,DataFrames,GlobalSearchRegression,BenchmarkTools'
      $totalLoadTime = (Measure-Command { julia $v -p auto -e $loadCmd }).TotalSeconds
      $packageLoadTime = $totalLoadTime - $startupTime
      
      $benchmarkOutput = (julia $v -p auto .\run_benchmark.jl $cov $seed)
      $splitOutput = $benchmarkOutput.Split(',')
      $ttfxTime = [double]$splitOutput[0]
      $runtime = [double]$splitOutput[1]
      
      $results += [PSCustomObject]@{
        JuliaVersion    = $v
        Covariates      = $cov
        StartupTime     = [Math]::Round($startupTime, 4)
        PackageLoadTime = [Math]::Round($packageLoadTime, 4)
        TTFX_Time       = [Math]::Round($ttfxTime, 4)
        Runtime         = [Math]::Round($runtime, 4)
      }
    }
  }
  $results |
    Sort-Object Covariates, JuliaVersion |
    Format-Table JuliaVersion, Covariates, StartupTime, PackageLoadTime, TTFX_Time, Runtime -AutoSize
}

Julia Benchmark Script (run_benchmark.jl):

# File: run_benchmark.jl
using Random, DataFrames, GlobalSearchRegression, BenchmarkTools
function run_workload(rng, cov)
    data = DataFrame(rand(rng, 100, cov), :auto)
    data.y = rand(rng, 100)
    # Call gsreg as a regular function
    gsreg("y ~ x*", data)
end

function main()
    cov = parse(Int, ARGS[1])
    seed = parse(Int, ARGS[2])
    rng = MersenneTwister(seed)
    ttfx_measurement = @timed run_workload(rng, cov)
    ttfx_time = ttfx_measurement.time
    fresh_rng = MersenneTwister(seed)
    benchmark_results = @benchmark run_workload($fresh_rng, $cov) samples=3 evals=1
    runtime_median = median(benchmark_results.times) / 1e9 
    println("$(ttfx_time),$(runtime_median)")
end
main()

New detailed results

JuliaVersion	Covariates	StartupTime	PackageLoadTime	TTFX_Time	Runtime
+1.8.5	15	0.19	20.65	13.35	0.14
+1.9.4	15	0.20	13.97	19.32	0.10
+lts	15	0.21	12.33	32.94	0.08
+release	15	0.17	14.92	23.27	0.09
+beta	15	0.23	15.85	30.12	0.09
+1.8.5	24	0.19	20.58	74.52	59.19
+1.9.4	24	0.21	14.64	68.11	47.04
+lts	24	0.20	11.64	80.11	46.55
+release	24	0.19	15.68	67.22	44.46
+beta	24	0.23	16.43	74.64	44.32

Analysis and Key Observations

The detailed results show that performance regression is mainly related to TTFX (JIT time). This slowdown is evident across both a light task (15 covariates; ~32,000 OLS estimations) and a heavy task (24 covariates; ~16.8 million OLS estimations). In both scenarios, lts (v1.10) and beta (v1.12) versions show a significant TTFX penalty compared to the release (v1.11) version. (Note: for the heavy task case we used 24 covariates instead of 25 to avoid system resource errors during the BenchmarkTools test loop). Hope this data is more helpful. Thanks again for pointing me in the right direction!

Jun 11 '25 16:06 dpanigo

Collection and Use of Your Personal Data

Jun 12 '25 10:06 Shikho18

@Shikho18 Sorry. I don't understand this.

Collection and Use of Your Personal Data.

Jun 12 '25 14:06 dpanigo