modAL
modAL copied to clipboard
Add support for parallel querying
When the number of unlabelled points is very large it may be beneficial to copy the classifier into a number of threads/processes and query chunks of the data separately, then recombine and rank them.
Query methods should take an n_jobs parameter which controls this behaviour.
Just adding a simple reference if that helps anyone
dask_ml has a ParallelPostFit wrapper that does exactly this
Edit : This wrapper clones the underlying estimator when being instanciated. In the context of Active Learning that might be an issue, as the estimator is updated quite frequently