skglm icon indicating copy to clipboard operation
skglm copied to clipboard

ENH - Numba compilation at import instead of execution time

Open Badr-MOUFAD opened this issue 1 year ago • 0 comments

I discovered that we can pre-compile numba functions without having to run them by specifying the signature of the function in @njit

import time
import numpy as np
from numba import njit


@njit("f8(f8[:])")
def compute_sum(arr):
    sum = 0.
    for element in arr:
        sum += element
    return sum


arr = np.random.randn(10_000)

start = time.perf_counter()
# Runned without overhead
compute_sum(arr)  
end = time.perf_counter()

print("total elapsed time:", end - start)

By doing so, we transfer the entire compilation overhead to import time, hence, releasing our ourself from the first run to cache numba compilation (as done in benchmarks).

This is to be considered yet requires attention as we have many functions

  • hence the import overhead might be huge unless we control when/where modules are imported
  • many functions, some with long signatures, hence readability, maintainability, and applicability challenges

The advantages are clear for small examples, I tried it also for a quite big code. However, I don't have much visibility on the impact of that on the whole package.


Also related to https://github.com/scikit-learn-contrib/skglm/issues/106

Badr-MOUFAD avatar Mar 07 '23 17:03 Badr-MOUFAD