Franck Charras
Franck Charras
Reporting some progress on this issue. I've ran a more exhaustive grid search of all possible combinations of performance parameters and it improved up to 70% of dpnp performance which...
With full grid search we achieve 90% of dpnp performance which is even way above initial expectations !
Sorry, this was a false flag, the autotuner was indeed able to find the fastest combination of parameters, but we had a bug for some of them that caused the...
The implementation I benchmarked in https://github.com/soda-inria/sklearn-numba-dpex/pull/102 explicitly load all the sliding windows in shared memory, but I believe I've seen implementations that rather rely on cache to implicitly enable fast...
`numba.cuda.random` RNG does not come from low level functions but [is implemented in numba](https://github.com/numba/numba/blob/main/numba/cuda/random.py) so in fact the current state of `numba.cuda.random` is easy to port or to mimic. E.g...
We just merged it see https://github.com/soda-inria/sklearn-numba-dpex/commit/6190f8f2ffc9a3872ac07a58137a7c59131966a8 for module and tests. It's true that jax rng interface is nicer but the xoroshiro128 pr was on its way to merge before we...
Hello @diptorupd sure I can do that, TY for the invitation ! I will be busy early this week, I'll start working on it mid-week if that's fine for you.
I think it's fixed and the issue can be closed @ogrisel https://github.com/IntelPython/numba-dpex/blob/main/numba_dpex/config.py#L36-L45
(Sorry for the lack of feedback this week, I took some off time.) Practically speaking, I wouldn't say this issue is too bothersome, it's more a matter of clarity. Python...
For me, #960 indeed fixes the issue. Just want to add that I realized this mistake I made in the OP: when saying > it returns a `float64` I didn't...