libact icon indicating copy to clipboard operation
libact copied to clipboard

Is there a way to perform batch mode active learning ?

Open lironesamoun opened this issue 8 years ago • 8 comments

Hi,

Instead of having of having unlabeled data which come as a stream, I would like to know if there is a way with libact to perform batch mode active learning meaning that the users can select multiples images at once (positive and negatives) ?

thank you in advance

lironesamoun avatar Feb 09 '17 15:02 lironesamoun

We haven't officially support batch mode active learning yet.

Though some algorithms with slight modification can achieve this feature. Take uncertainty sampling for example, you may change the following line

https://github.com/ntucllab/libact/blob/master/libact/query_strategies/uncertainty_sampling.py#L111

by replacing np.argmin with something like n_smallest to return the index of n most uncertain data.

yangarbiter avatar Feb 12 '17 07:02 yangarbiter

Thank you, I will try. When do you think you will support officially batch mode active learning ?

My question may be silly but I have made some changes inside the uncertainty_sampling file, then I have made sudo python setup.py build and sudo python setup.py install and my changes haven't been taken in consideration...I can do whatever I want, where I run a python example, nothing has changed. Do you know why ? Or maybe did I miss something ?

lironesamoun avatar Feb 14 '17 08:02 lironesamoun

You may want to check your environment variables like PYTHONPATH or PATH. Or maybe you are inside a virtualenv where you don't need sudo to install the package.

yangarbiter avatar Feb 15 '17 00:02 yangarbiter

Yes, the problem was because of the virtual environment. Thank for your help !

lironesamoun avatar Feb 21 '17 08:02 lironesamoun

In order to not open a new thread, is there a way to associate an image to each feature ? I mean If I don't use pixel as a feature, I cannot use InteractiveLabel. So for example, If I use an abstract feature like SIFT, how can I associate each feature to each image. I tried to use a dictionary but it doesn't work.. thank in advance

lironesamoun avatar Feb 21 '17 10:02 lironesamoun

Do you mean that the feature is separated with the pixel array?

If this is what you mean, I think you can try putting the corresponding pixel array to the label function of InteractiveLabeler.

For the label_digits example: https://github.com/ntucllab/libact/blob/master/examples/label_digits.py#L90

Instead of lb = lbr.label(trn_ds.data[ask_id][0].reshape(8, 8)), you may put lb = lbr.label(feature_to_image(trn_ds.data[ask_id][0]))

If your image is not in a form of pixel, you may need to modify the image rendering part of InteractiveLabeler. https://github.com/ntucllab/libact/blob/master/libact/labelers/interactive_labeler.py#L32

yangarbiter avatar Feb 22 '17 08:02 yangarbiter

Yes. For example, I have an array of images (not in a form of pixel) and I have an array of associated features. images = [image1,..,imageN] and features = [features1,...,featuresN]

So yes I would like to create a Dataset object but after doing that, I lose track of my corresponding images as I have only in Dataset the pool X and label Y. There is no way to know which X corresponds to which image, if you see what I mean. I will try to do like you advise me.

lironesamoun avatar Feb 22 '17 09:02 lironesamoun

The most simply way of doing it is to take featureX and search through [features1,...,featuresN] one by one for the index and then go back to the imageX.

Though if I remembered correctly, the index of features in the dataset won't change during the process. This means that if you create dataset like this Dataset(features, Y). The index returned by i = qs.make_query() should be the same as the original one. which means trn_ds.data[i][0] shoud be the same as features[i], and you will show image[i].

If the first method is too slow for your application, you can double check with the second method by running some small samples.

yangarbiter avatar Feb 22 '17 10:02 yangarbiter