alpha_mix_active_learning icon indicating copy to clipboard operation
alpha_mix_active_learning copied to clipboard

Random Sampling from Labeled Indices

Open zuliani99 opened this issue 9 months ago • 0 comments

Hi authors, thank you for your great work!

I have a doubt on the final part of the query function:

if len(selected_idxs) < n:
    remained = n - len(selected_idxs)
    idx_lb = copy.deepcopy(self.idxs_lb)
    idx_lb[selected_idxs] = True
    selected_idxs = np.concatenate([selected_idxs, np.random.choice(np.where(idx_lb == 0)[0], remained)])
    print('picked %d samples from RandomSampling.' % (remained))

Why are you performing this random sampling from the labeled set, doing so will include some duplicate indices in the set, am I correct? Moreover the code should break once we set to True the "labeled indices that we have selected", sounds weird to me honestly.

Whereas doing like below should take the remaining observations randomly from the unlabeled set of indices excluding the indices that we have already selected using your sampling method.

if len(selected_idxs) < n_top_k_obs:
    remained = n_top_k_obs - len(selected_idxs)
    bool_idx_unlb = np.zeros(len(self.rand_unlab_sample))
            
    bool_idx_unlb[[self.rand_unlab_sample.index(idx) for idx in selected_idxs]] = 1
            
    selected_idxs = np.concatenate([
        selected_idxs, 
        np.random.choice(np.array(self.rand_unlab_sample)[np.where(bool_idx_unlb == 0)[0]], remained, replace=False)
    ])
    logger.info('picked %d samples from RandomSampling.' % (remained))

Thank you for your time.

Best regards Riccardo

zuliani99 avatar May 05 '24 14:05 zuliani99