modAL icon indicating copy to clipboard operation
modAL copied to clipboard

Add support for parallel querying

Open zacps opened this issue 4 years ago • 1 comments

When the number of unlabelled points is very large it may be beneficial to copy the classifier into a number of threads/processes and query chunks of the data separately, then recombine and rank them.

Query methods should take an n_jobs parameter which controls this behaviour.

zacps avatar Feb 15 '21 03:02 zacps

Just adding a simple reference if that helps anyone

dask_ml has a ParallelPostFit wrapper that does exactly this

Edit : This wrapper clones the underlying estimator when being instanciated. In the context of Active Learning that might be an issue, as the estimator is updated quite frequently

remiadon avatar May 04 '21 10:05 remiadon