skglm
skglm copied to clipboard
FEAT - Use @njit(cache=True)
Use caching option from Numba
I explore a little bit the skglm source code and I realized you are using Numba decorator @nijt
.
I was wondering if it makes senses to switch to @nijt(cache=True)
.
Indeed, according to Numba documentation caching compiled functions reduces the future compilation time.
Thanks for the pointer @PascalCarrivain, I was not aware of this feature.
It seems to help a lot (not for the first compilation, but for the subsequent calls), on a very CPU bound problem. From the first run to the second, I change only the value of cache
in the following snippet)
In [1]: %run numba_cache.py
0.5166642830008641
0.10939704200063716
0.11151579199940898
0.11363932000131172
0.10976769300032174
In [2]:
Do you really want to exit ([y]/n)?
(base) ➜ scripts git:(main) ✗ ipython
Python 3.10.12 | packaged by conda-forge | (main, Jun 23 2023, 22:40:32) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.6.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: %run numba_cache.py
0.529917010000645
0.004792376999830594
0.004180643998552114
0.004381413000373868
0.004056714000398642
import numpy as np
import time
from numba import njit
a = np.arange(10_000)
for i in range(5):
def my_sum(a):
acc = 0
for val in a:
acc += val
return acc
t0 = time.perf_counter()
njit(my_sum, cache=True)(a)
t1 = time.perf_counter()
print(t1 - t0)
Can you try to test the impact of using cache=True in our codebase on a real life skglm problem, ie fitting an estimator on a simple problem ?
@mathurinm Yes, I will do it late this year or early next year.
@PascalCarrivain do you know if this can help the first compilation too ?
I do not see a huge difference for the first compilation (at least on my projects).