julia icon indicating copy to clipboard operation
julia copied to clipboard

Performance regression in Julia 1.12 (beta4) for heavy OLS workloads (beyond package latency)

Open dpanigo opened this issue 6 months ago • 4 comments

Description

I also came across this discussion of package-loading latency in 1.12, which mostly attributes regressions to precompilation differences across Julia versions (relative to loading or downloading packages, in most cases). In my benchmarks, however, all packages are already precompiled, so loading times should be negligible—and yet the performance regression actually (slightly) grows in absolute terms with heavy tasks executions.

When running a large number of OLS estimations via GlobalSearchRegression on synthetic data, Julia 1.12 (beta) is significantly slower than both Julia 1.11 (release) and 1.9.4.

Benchmark Code

Trying to mimic the "real world scenario" for my work (e.g. calling julia codes from console) I did the following PowerShell minimal script (using juliaup to manage julia versions):

& {
  $results = @()

  @(15,25) | ForEach-Object {
    $cov  = $_
    $seed = $cov + 1000

    @('+1.8.5','+1.9.4','+lts','+release','+beta') | ForEach-Object {
      $v = $_

      $cmd =  'using Random,DataFrames,GlobalSearchRegression; ' +
              'rng = MersenneTwister(' + $seed + '); '      +
              'data = DataFrame(rand(rng,100,' + $cov + '), :auto); ' +
              'data.y = rand(rng,100); '                   +
              'gsreg("y ~ x*", data)'

      $times = 1..5 | ForEach-Object {
        ( Measure-Command { julia $v -p auto -e $cmd } ).TotalSeconds
      }
      $avg = ($times | Measure-Object -Average).Average

      $results += [PSCustomObject]@{
        JuliaVersion = $v
        Covariates   = $cov
        AverageTime  = [Math]::Round($avg, 4)
      }
    }
  }

  $results |
    Where-Object { $_.Covariates -ne 20 } |
    Sort-Object Covariates, JuliaVersion |
    Format-Table JuliaVersion, Covariates, AverageTime -AutoSize
}

Observed results

On Windows 11 machine (16 threads, 32 GB RAM) with PowerShell 7.x, running the above script produced:

JuliaVersion Covariates AverageTime (s)
+1.8.5 15 35.16
+1.9.4 15 39.53
+lts 15 56.52
+release 15 47.59
+beta 15 51.09
+1.8.5 25 154.37
+1.9.4 25 129.27
+lts 25 140.17
+release 25 128.22
+beta 25 135.89

Julia 1.12 (beta) is consistently (5–30%) slower than Julia 1.11 (release) and 1.9.4 on both moderate (15) and heavy (25) covariate workloads.

Discussion

Given the package-loading latency discussion and the performance improvements documented in the Julia 1.12 NEWS, one would anticipate that any initial “time-to-first-x” (TTFX) penalty in 1.12 should be pregressively reverted for prolonged, compute-intensive runs. However, could it be that under heavier workloads and longer runs the absolute performance gap (in absolute terms) actually widens rather than narrows? Might this indicate that factors beyond simple TTFX are contributing to the observed regression? I understand that Julia 1.13 is currently under development; I hope these findings can help for that work.

dpanigo avatar Jun 11 '25 14:06 dpanigo

Split up your measurements. Compare

  • Julia startup time
  • Package load time
  • TTFX (overhead in calling the function for the first time)
  • Runtime (with e.g. BenchmarkTools)

Shoving everything into a single number doesn't help much with identifying what has gotten slower.

KristofferC avatar Jun 11 '25 14:06 KristofferC

Thanks @KristofferC, Following your advice, I've re-run the benchmarks. Here you have the scripts used to generate our new results.

PowerShell Orchestration Script:

& {
  $results = @()
  @(15, 24) | ForEach-Object {
    $cov  = $_
    $seed = $cov + 1000
    @('+1.8.5', '+1.9.4', '+lts', '+release', '+beta') | ForEach-Object {
      $v = $_
      $startupTime = (Measure-Command { julia $v -e "exit()" }).TotalSeconds
      
      $loadCmd = 'using Random,DataFrames,GlobalSearchRegression,BenchmarkTools'
      $totalLoadTime = (Measure-Command { julia $v -p auto -e $loadCmd }).TotalSeconds
      $packageLoadTime = $totalLoadTime - $startupTime
      
      $benchmarkOutput = (julia $v -p auto .\run_benchmark.jl $cov $seed)
      $splitOutput = $benchmarkOutput.Split(',')
      $ttfxTime = [double]$splitOutput[0]
      $runtime = [double]$splitOutput[1]
      
      $results += [PSCustomObject]@{
        JuliaVersion    = $v
        Covariates      = $cov
        StartupTime     = [Math]::Round($startupTime, 4)
        PackageLoadTime = [Math]::Round($packageLoadTime, 4)
        TTFX_Time       = [Math]::Round($ttfxTime, 4)
        Runtime         = [Math]::Round($runtime, 4)
      }
    }
  }
  $results |
    Sort-Object Covariates, JuliaVersion |
    Format-Table JuliaVersion, Covariates, StartupTime, PackageLoadTime, TTFX_Time, Runtime -AutoSize
}

Julia Benchmark Script (run_benchmark.jl):

# File: run_benchmark.jl
using Random, DataFrames, GlobalSearchRegression, BenchmarkTools
function run_workload(rng, cov)
    data = DataFrame(rand(rng, 100, cov), :auto)
    data.y = rand(rng, 100)
    # Call gsreg as a regular function
    gsreg("y ~ x*", data)
end

function main()
    cov = parse(Int, ARGS[1])
    seed = parse(Int, ARGS[2])
    rng = MersenneTwister(seed)
    ttfx_measurement = @timed run_workload(rng, cov)
    ttfx_time = ttfx_measurement.time
    fresh_rng = MersenneTwister(seed)
    benchmark_results = @benchmark run_workload($fresh_rng, $cov) samples=3 evals=1
    runtime_median = median(benchmark_results.times) / 1e9 
    println("$(ttfx_time),$(runtime_median)")
end
main()

New detailed results

JuliaVersion Covariates StartupTime PackageLoadTime TTFX_Time Runtime
+1.8.5 15 0.19 20.65 13.35 0.14
+1.9.4 15 0.20 13.97 19.32 0.10
+lts 15 0.21 12.33 32.94 0.08
+release 15 0.17 14.92 23.27 0.09
+beta 15 0.23 15.85 30.12 0.09
+1.8.5 24 0.19 20.58 74.52 59.19
+1.9.4 24 0.21 14.64 68.11 47.04
+lts 24 0.20 11.64 80.11 46.55
+release 24 0.19 15.68 67.22 44.46
+beta 24 0.23 16.43 74.64 44.32

Analysis and Key Observations

The detailed results show that performance regression is mainly related to TTFX (JIT time). This slowdown is evident across both a light task (15 covariates; ~32,000 OLS estimations) and a heavy task (24 covariates; ~16.8 million OLS estimations). In both scenarios, lts (v1.10) and beta (v1.12) versions show a significant TTFX penalty compared to the release (v1.11) version. (Note: for the heavy task case we used 24 covariates instead of 25 to avoid system resource errors during the BenchmarkTools test loop). Hope this data is more helpful. Thanks again for pointing me in the right direction!

dpanigo avatar Jun 11 '25 16:06 dpanigo

  1. Collection and Use of Your Personal Data

Shikho18 avatar Jun 12 '25 10:06 Shikho18

@Shikho18 Sorry. I don't understand this.

  1. Collection and Use of Your Personal Data.

dpanigo avatar Jun 12 '25 14:06 dpanigo