mir_eval
mir_eval copied to clipboard
Acceleration || branching from #68
The issue of acceleration came up in #68 , either by numba or by cython. It seems like we should branch the discussion off into its own issue.
Cython
- Pro: minimal dependencies, pip installable, wheels
- Con: packaging can be tricky, significant developer time required
Numba
- Pro: minimal developer time required
- Con: not pip installable due to LLVM dependency
My two cents: despite the dependency chain problem, I'd prefer numba over cython because it would be easier to deploy across the entire module.
In librosa, we handled this by making the numba dependency optional, and providing a dummy decorator optional_jit that does a no-op if numba is not available, and jit-compilation if it is. That way, everything still works, but if you want to get the acceleration benefits, it's on you to install numba.
Side note: if we also add conda(-forge) package distribution, we can easily add numba as a hard dependency there.
What do y'all think?
Side note: if we also add conda(-forge) package distribution, we can easily add numba as a hard dependency there.
+1!
What part of mir_eval is too slow? I think most of the metrics are reasonably fast, and those that aren't could probably benefit most from some regular Python hand-optimization. I don't think it's worth considering Cython, given the goal of "a transparent implementation". numba would be fine with the optional jit decorator.
What part of mir_eval is too slow?
The separation module is particularly slow. Yes, it could benefit from a thorough efficiency audit -- I've noticed some pretty obvious examples of redundant computation -- but I think a lot of its slowness comes from tight-loop calculations that don't trivially vectorize, but would be easily handled by numba.
Some of the segmentation / hierarchy metrics could also benefit from optimization.
I haven't benchmarked the rest of it, but those alone seem worth the effort to me.
I think as a first pass it's worth looking in detail at what metrics are prohibitively slow and if there's an easy way to make them faster without numba (since we can assume for the time being most users will not have numba installed). Adding numba + optional_jit seems like a no-brainer after that.
Sure.
Side note: is it worth making up a separate issue to put mir_eval in conda-forge?
Side note: is it worth making up a separate issue to put mir_eval in conda-forge?
Sure, wouldn't hurt. I would prioritize it based on how much people are complaining about it not being in conda-forge.
:complain:
😄
About which functions?
Btw, numba now has wheels on pypi, so we probably could make it a hard dependency.
Update: llvmlite >=0.20 (now up) ships with windows wheels.
The separation metrics are too slow!!! It takes up all the time to compute the score when I was training my model. Could you please add jit to those eval functions, e.g. bss_eval_sources.
Could you please add jit to those eval functions
I'm not sure that jitting would help for bss eval. Last I checked, the bottleneck here was in computing a bunch of frame-wise least squares solutions, and that part is already calling out to C under the hood (via numpy/scipy).
But do feel free to try it out and send a PR if it improves.
I'm not sure that jitting would help for bss eval. Last I checked, the bottleneck here was in computing a bunch of frame-wise least squares solutions, and that part is already calling out to C under the hood (via numpy/scipy).
Can confirm @bmcfee I tried to to strip out some parts for numba and got a bit more than 5% speedup