Loess.jl icon indicating copy to clipboard operation
Loess.jl copied to clipboard

Implement confidence intervals for predictions

Open andreasnoack opened this issue 2 months ago • 1 comments

This implements the confidence interval as described in Cleveland and Grosse 1991 except for the statistical approximation of the deltas described in section 4 of the paper. They don't seem to share the coefficients of their fit, so it is not easy to implement that part. In addition, computers have much more memory these days, so I don't think the big matrix is a problem in most cases. I'd be interested in anybody knows about more recent approaches to calculating the deltas without the need for the big matrix.

I tried to mimic the interface for predict in GLM but decided to use a struct to return the expensive helper quantities together with the confidence bounds. I'm not a big fan of using Symbols for finite options like here but that is what GLM currently does. Maybe we should change it, but that is a separate concern.

With this, you can construct this plot

Screenshot 2025-10-29 at 23 12 50

The same plot with ggplot2 which uses Cleveland and Grosses code for the computations is

Screenshot 2025-10-29 at 23 12 59

Closes #29

andreasnoack avatar Oct 30 '25 20:10 andreasnoack

The new implementation stores the rows of the hat matrix (and a bit more) for each vertex when constructing the Loess fit. That requires much more memory than the old implementation and this causes the benchmark runs to run out of memory because it tests relatively large problems, see https://github.com/JuliaStats/Loess.jl/blob/5ecfefa0b7a5e4e0e1cd4a7ef657493c23d3cb2d/benchmark/benchmarks.jl#L7-L12. It looks like R might be avoiding these rows in the loess function and only constructs when when you call predict on the loess object but you pay the price of essentially recomputing all the local fits in predict. In my opinion, the main use of Loess these days is for visualization with uncertainty bounds, so I'm leaning towards just accepting that the implementation can't handle as large problems.

andreasnoack avatar Oct 30 '25 20:10 andreasnoack

Benchmark Report for /home/runner/work/Loess.jl/Loess.jl

Job Properties

  • Time of benchmarks:
    • Target: 27 Nov 2025 - 19:31
    • Baseline: 27 Nov 2025 - 19:31
  • Package commits:
    • Target: 9460c3b
    • Baseline: c61757b
  • Julia commits:
    • Target: ca9b666
    • Baseline: ca9b666
  • Julia command flags:
    • Target: None
    • Baseline: -C,native,-J/opt/hostedtoolcache/julia/1.12.2/x64/lib/julia/sys.so,-g1,-O3,-e,using Pkg; Pkg.update()
  • Environment variables:
    • Target: None
    • Baseline: None

Results

A ratio greater than 1.0 denotes a possible regression (marked with :x:), while a ratio less than 1.0 denotes a possible improvement (marked with :white_check_mark:). Brackets display tolerances for the benchmark estimates. Only significant results - results that indicate possible regressions or improvements - are shown below (thus, an empty table means that all benchmark results remained invariant between builds).

ID time ratio memory ratio
["random", "100"] 1.47 (5%) :x: 0.33 (1%) :white_check_mark:
["random", "1000"] 1.32 (5%) :x: 1.16 (1%) :x:
["random", "10000"] 1.31 (5%) :x: 1.88 (1%) :x:
["random", "100000"] 1.23 (5%) :x: 2.00 (1%) :x:
["random", "1000000"] 1.25 (5%) :x: 2.02 (1%) :x:
["ties", "sine"] 1.38 (5%) :x: 1.77 (1%) :x:

Benchmark Group List

Here's a list of all the benchmark groups executed by this job:

  • ["random"]
  • ["ties"]

Julia versioninfo

Target

Julia Version 1.12.2
Commit ca9b6662be4 (2025-11-20 16:25 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 24.04.3 LTS
  uname: Linux 6.11.0-1018-azure #18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz: 
              speed         user         nice          sys         idle          irq
       #1  3497 MHz        205 s          0 s         42 s       4858 s          0 s
       #2  3492 MHz        264 s          0 s         52 s       4803 s          0 s
       #3  3492 MHz        524 s          0 s         49 s       4555 s          0 s
       #4  3481 MHz        520 s          0 s         49 s       4552 s          0 s
  Memory: 15.619712829589844 GB (14275.87109375 MB free)
  Uptime: 520.43 sec
  Load Avg:  1.12  0.48  0.2
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, icelake-server)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)

Baseline

Julia Version 1.12.2
Commit ca9b6662be4 (2025-11-20 16:25 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
      Ubuntu 24.04.3 LTS
  uname: Linux 6.11.0-1018-azure #18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025 x86_64 x86_64
  CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz: 
              speed         user         nice          sys         idle          irq
       #1  3493 MHz        211 s          0 s         43 s       5330 s          0 s
       #2  3491 MHz        265 s          0 s         53 s       5280 s          0 s
       #3  3491 MHz        843 s          0 s         55 s       4710 s          0 s
       #4  3486 MHz        668 s          0 s         51 s       4880 s          0 s
  Memory: 15.619712829589844 GB (13947.18359375 MB free)
  Uptime: 568.42 sec
  Load Avg:  1.05  0.56  0.25
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, icelake-server)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)

github-actions[bot] avatar Nov 05 '25 19:11 github-actions[bot]

I've reduced the problem size of the benchmarks to not exceeding 10000 elements, and that goes through without hitting the memory limit. The overall runtime is slower, which shouldn't be a surprise, given that more work is being done and more allocations are required to be able to compute the uncertainty of the predictions. The question is if there is a need for very large er very quick Loess computations where the uncertainty calculation is just a nuisance.

andreasnoack avatar Nov 05 '25 19:11 andreasnoack

The question is if there is a need for very large er very quick Loess computations where the uncertainty calculation is just a nuisance.

Sorry for the delay in following up here, but this is exactly my biggest concern. I have a use case where we put a smooth line on a plot with a very large number of points, but I haven't had a chance to test the performance here on that. I also sometimes use a smooth line on plots of Bernoulli data -- the smooth gives a good first impression of how the probability changes across different x values. I'll try to find some time to test on these big examples this week!

palday avatar Nov 10 '25 17:11 palday

@palday It would indeed be good if you could try out some of your larger examples. If we conclude that supporting large problems without uncertainty is important, I can see a couple of ways forward. I'm not particularly happy with any of them so my preference is for the current version, but I'm of course biased by my use cases. The alternatives I see are

  1. Compute the hat matrix rows in predict. This essentially means redoing all the fitting in fit when running predict. I think this is what R does.
  2. Use sparse representations of the F matrices. For this to be useful, we'd have to lower the value of span quite a bit. Otherwise, it won't reduce the storage.
  3. Set a flag in fit to signal if the fitt allows for uncertainty calculations and error out in predict if needed.

andreasnoack avatar Nov 11 '25 07:11 andreasnoack

@devmotion thanks for the comments. I believe I've addressed all of them now.

andreasnoack avatar Nov 11 '25 15:11 andreasnoack

@palday did you get a chance to run your examples? I'm quite eager to get error bounds in Loess plots, so it would be good to find a path forward here.

andreasnoack avatar Nov 17 '25 10:11 andreasnoack

@palday bump. Do you object to this or are you okay with moving forward here?

andreasnoack avatar Nov 21 '25 10:11 andreasnoack

@devmotion it looks like the benchmarks now complete but then fail afterwards with an error about the repo being dirty. Have you seen that error before?

andreasnoack avatar Nov 25 '25 20:11 andreasnoack

No, I have never seen this error. I wonder if it could be related to either the GHA update or the 1.12.2 release.

devmotion avatar Nov 26 '25 10:11 devmotion

Ah, I think it could be https://github.com/JuliaCI/PkgBenchmark.jl/issues/135 which was caused by a Project.toml file that Pkg automatically reformatted, and hence the repo seemed to be dirty in CI.

Edit: Although the Project.toml seems fine to me...

devmotion avatar Nov 26 '25 10:11 devmotion

Generally, maybe the benchmark setup would be a bit simpler with AirspeedVelocity.jl. But of course either approach should be fine.

devmotion avatar Nov 26 '25 10:11 devmotion

@devmotion your intuition was right here. With Julia 1.12.2, the Pkg.dev command added a sources line to the Project.toml. For now, I'm working around the error by committing the change before running the benchmarks. Given that it is now still possible to run the n=100000 benchmarks with a modest time penalty, I'll suggest that we go ahead and merge this one. Just to give a perspective, the n=100000 example runs in 75 seconds in R which uses a derived version of the original Fortran implementation. With the changes in this PR the same problem runs in 63 milliseconds.

andreasnoack avatar Nov 27 '25 13:11 andreasnoack