zoomerjoin icon indicating copy to clipboard operation
zoomerjoin copied to clipboard

[FR] Option to set the number of threads?

Open etiennebacher opened this issue 1 year ago • 2 comments

Is your feature request related to a problem? Please describe. If I'm not mistaken, zoomerjoin uses all threads available on the laptop, which explains in part its great performance. It would be nice to be able to configure the number of threads, so that I can use a part of the CPU for other tasks (I'm surprised that CRAN accepted the package given they don't want more than 2 threads to be used by tests and examples).

Describe the solution you'd like Either a function or an option to specify the number of threads that can be used by zoomerjoin.

Describe alternatives you've considered /

Additional context /

Thanks again for this great package!

etiennebacher avatar Feb 09 '24 15:02 etiennebacher

This is totally on my list of priorities as well. Right now, you are able to constrain the number of threads using the RAYON_NUM_THREADS environment variable, but this can't be changed after the rayon global thread pool is set up (although I'm not an expert on how rayon works).

The package passes the CRAN checks because when it is attached it looks for the _R_CHECK_LIMIT_CORES_ environment variable set on the CRAN machines and if it's present it sets the number of cores used to two. This obviously isn't ideal so at some point, I want to look into creating a thread pool every time a joining function is run so that the number of threads can be controlled by an argument.

beniaminogreen avatar Feb 10 '24 03:02 beniaminogreen

I confirm that putting Sys.setenv(RAYON_NUM_THREADS = 2) (for example) after loading the package works fine, thanks for the workaround!

etiennebacher avatar Feb 10 '24 08:02 etiennebacher