Implement confidence intervals for predictions
This implements the confidence interval as described in Cleveland and Grosse 1991 except for the statistical approximation of the deltas described in section 4 of the paper. They don't seem to share the coefficients of their fit, so it is not easy to implement that part. In addition, computers have much more memory these days, so I don't think the big matrix is a problem in most cases. I'd be interested in anybody knows about more recent approaches to calculating the deltas without the need for the big matrix.
I tried to mimic the interface for predict in GLM but decided to use a struct to return the expensive helper quantities together with the confidence bounds. I'm not a big fan of using Symbols for finite options like here but that is what GLM currently does. Maybe we should change it, but that is a separate concern.
With this, you can construct this plot
The same plot with ggplot2 which uses Cleveland and Grosses code for the computations is
Closes #29
The new implementation stores the rows of the hat matrix (and a bit more) for each vertex when constructing the Loess fit. That requires much more memory than the old implementation and this causes the benchmark runs to run out of memory because it tests relatively large problems, see https://github.com/JuliaStats/Loess.jl/blob/5ecfefa0b7a5e4e0e1cd4a7ef657493c23d3cb2d/benchmark/benchmarks.jl#L7-L12. It looks like R might be avoiding these rows in the loess function and only constructs when when you call predict on the loess object but you pay the price of essentially recomputing all the local fits in predict. In my opinion, the main use of Loess these days is for visualization with uncertainty bounds, so I'm leaning towards just accepting that the implementation can't handle as large problems.
Benchmark Report for /home/runner/work/Loess.jl/Loess.jl
Job Properties
- Time of benchmarks:
- Target: 27 Nov 2025 - 19:31
- Baseline: 27 Nov 2025 - 19:31
- Package commits:
- Target: 9460c3b
- Baseline: c61757b
- Julia commits:
- Target: ca9b666
- Baseline: ca9b666
- Julia command flags:
- Target: None
- Baseline:
-C,native,-J/opt/hostedtoolcache/julia/1.12.2/x64/lib/julia/sys.so,-g1,-O3,-e,using Pkg; Pkg.update()
- Environment variables:
- Target: None
- Baseline: None
Results
A ratio greater than 1.0 denotes a possible regression (marked with :x:), while a ratio less
than 1.0 denotes a possible improvement (marked with :white_check_mark:). Brackets display tolerances for the benchmark estimates. Only significant results - results
that indicate possible regressions or improvements - are shown below (thus, an empty table means that all
benchmark results remained invariant between builds).
| ID | time ratio | memory ratio |
|---|---|---|
["random", "100"] |
1.47 (5%) :x: | 0.33 (1%) :white_check_mark: |
["random", "1000"] |
1.32 (5%) :x: | 1.16 (1%) :x: |
["random", "10000"] |
1.31 (5%) :x: | 1.88 (1%) :x: |
["random", "100000"] |
1.23 (5%) :x: | 2.00 (1%) :x: |
["random", "1000000"] |
1.25 (5%) :x: | 2.02 (1%) :x: |
["ties", "sine"] |
1.38 (5%) :x: | 1.77 (1%) :x: |
Benchmark Group List
Here's a list of all the benchmark groups executed by this job:
["random"]["ties"]
Julia versioninfo
Target
Julia Version 1.12.2
Commit ca9b6662be4 (2025-11-20 16:25 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
Ubuntu 24.04.3 LTS
uname: Linux 6.11.0-1018-azure #18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025 x86_64 x86_64
CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz:
speed user nice sys idle irq
#1 3497 MHz 205 s 0 s 42 s 4858 s 0 s
#2 3492 MHz 264 s 0 s 52 s 4803 s 0 s
#3 3492 MHz 524 s 0 s 49 s 4555 s 0 s
#4 3481 MHz 520 s 0 s 49 s 4552 s 0 s
Memory: 15.619712829589844 GB (14275.87109375 MB free)
Uptime: 520.43 sec
Load Avg: 1.12 0.48 0.2
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, icelake-server)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)
Baseline
Julia Version 1.12.2
Commit ca9b6662be4 (2025-11-20 16:25 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
Ubuntu 24.04.3 LTS
uname: Linux 6.11.0-1018-azure #18~24.04.1-Ubuntu SMP Sat Jun 28 04:46:03 UTC 2025 x86_64 x86_64
CPU: Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz:
speed user nice sys idle irq
#1 3493 MHz 211 s 0 s 43 s 5330 s 0 s
#2 3491 MHz 265 s 0 s 53 s 5280 s 0 s
#3 3491 MHz 843 s 0 s 55 s 4710 s 0 s
#4 3486 MHz 668 s 0 s 51 s 4880 s 0 s
Memory: 15.619712829589844 GB (13947.18359375 MB free)
Uptime: 568.42 sec
Load Avg: 1.05 0.56 0.25
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, icelake-server)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)
I've reduced the problem size of the benchmarks to not exceeding 10000 elements, and that goes through without hitting the memory limit. The overall runtime is slower, which shouldn't be a surprise, given that more work is being done and more allocations are required to be able to compute the uncertainty of the predictions. The question is if there is a need for very large er very quick Loess computations where the uncertainty calculation is just a nuisance.
The question is if there is a need for very large er very quick Loess computations where the uncertainty calculation is just a nuisance.
Sorry for the delay in following up here, but this is exactly my biggest concern. I have a use case where we put a smooth line on a plot with a very large number of points, but I haven't had a chance to test the performance here on that. I also sometimes use a smooth line on plots of Bernoulli data -- the smooth gives a good first impression of how the probability changes across different x values. I'll try to find some time to test on these big examples this week!
@palday It would indeed be good if you could try out some of your larger examples. If we conclude that supporting large problems without uncertainty is important, I can see a couple of ways forward. I'm not particularly happy with any of them so my preference is for the current version, but I'm of course biased by my use cases. The alternatives I see are
- Compute the hat matrix rows in
predict. This essentially means redoing all the fitting infitwhen runningpredict. I think this is what R does. - Use sparse representations of the
Fmatrices. For this to be useful, we'd have to lower the value ofspanquite a bit. Otherwise, it won't reduce the storage. - Set a flag in
fitto signal if the fitt allows for uncertainty calculations and error out inpredictif needed.
@devmotion thanks for the comments. I believe I've addressed all of them now.
@palday did you get a chance to run your examples? I'm quite eager to get error bounds in Loess plots, so it would be good to find a path forward here.
@palday bump. Do you object to this or are you okay with moving forward here?
@devmotion it looks like the benchmarks now complete but then fail afterwards with an error about the repo being dirty. Have you seen that error before?
No, I have never seen this error. I wonder if it could be related to either the GHA update or the 1.12.2 release.
Ah, I think it could be https://github.com/JuliaCI/PkgBenchmark.jl/issues/135 which was caused by a Project.toml file that Pkg automatically reformatted, and hence the repo seemed to be dirty in CI.
Edit: Although the Project.toml seems fine to me...
Generally, maybe the benchmark setup would be a bit simpler with AirspeedVelocity.jl. But of course either approach should be fine.
@devmotion your intuition was right here. With Julia 1.12.2, the Pkg.dev command added a sources line to the Project.toml. For now, I'm working around the error by committing the change before running the benchmarks. Given that it is now still possible to run the n=100000 benchmarks with a modest time penalty, I'll suggest that we go ahead and merge this one. Just to give a perspective, the n=100000 example runs in 75 seconds in R which uses a derived version of the original Fortran implementation. With the changes in this PR the same problem runs in 63 milliseconds.