modAL icon indicating copy to clipboard operation
modAL copied to clipboard

Training and batch set size for Ranked Batch Sampling

Open ricomnl opened this issue 4 years ago • 0 comments

Hi,

I've tried setting up the Ranked Batch Sampling Learner with a full dataset of ~2M samples and a batch size of ~8k samples. The code seemingly gets stuck after running learner.query() and I'm wondering if the batch size is too large (it's significantly larger than the one used in the tutorial. Is ~8k samples too large for the batch size?

My setup is similar to the one from the tutorial:

from functools import partial
from modAL.batch import uncertainty_batch_sampling
...

N_QUERIES = 5
BATCH_SIZE = 8417
preset_batch = partial(uncertainty_batch_sampling, n_instances=BATCH_SIZE)

learner = ActiveLearner(
    estimator=model,
    X_training=x_train,
    y_training=y_train,
    query_strategy=preset_batch,
)

results = []
for index in range(N_QUERIES-1):
    query_index, query_instance = learner.query(x_pool)
    
    x, y = x_pool[query_index], y_pool[query_index]
    learner.teach(X=x, y=y)
    ...

ricomnl avatar May 31 '21 15:05 ricomnl