modAL
modAL copied to clipboard
Training and batch set size for Ranked Batch Sampling
Hi,
I've tried setting up the Ranked Batch Sampling Learner with a full dataset of ~2M samples and a batch size of ~8k samples. The code seemingly gets stuck after running learner.query() and I'm wondering if the batch size is too large (it's significantly larger than the one used in the tutorial.
Is ~8k samples too large for the batch size?
My setup is similar to the one from the tutorial:
from functools import partial
from modAL.batch import uncertainty_batch_sampling
...
N_QUERIES = 5
BATCH_SIZE = 8417
preset_batch = partial(uncertainty_batch_sampling, n_instances=BATCH_SIZE)
learner = ActiveLearner(
estimator=model,
X_training=x_train,
y_training=y_train,
query_strategy=preset_batch,
)
results = []
for index in range(N_QUERIES-1):
query_index, query_instance = learner.query(x_pool)
x, y = x_pool[query_index], y_pool[query_index]
learner.teach(X=x, y=y)
...