numba-dpex
numba-dpex copied to clipboard
Lower than expected performance in blackscholes numpy implementation
The blackscholes numpy implementation in dpbench is ~26X slower than the corresponding kernel and prange implementations.
How to reproduce:
- Follow instructions to setup dpbench.
- Run blackscholes -
python -c "import dpbench; dpbench.run_benchmark(\"black_scholes\")"
The slowdown maybe related to kernel launch overhead in the JitKernel
custom dispatcher class. Overhead is especially noticeable with small problem sizes. The experimental.dispatcher.KernelDispatcher
fixes the launch overhead.
Can you please reevaluate with the new dispatcher?