ISIC-Archive-Downloader icon indicating copy to clipboard operation
ISIC-Archive-Downloader copied to clipboard

Download k samples of each class

Open pkill37 opened this issue 5 years ago • 3 comments

Right now if you wanted to download k samples of each class (malignant and benign) you would have to manually download the malignants first

python download_archive.py --num-images k --filter malignant

And then in a separate directory download the benigns

python download_archive.py --num-images k --filter benign

Otherwise you'd overwrite some of the images. And because some images will have the same filenames, you have to do some preprocessing to rename them all consistently before merging them together.

It would be nice if the script was able to do this in one go.

pkill37 avatar Feb 01 '19 17:02 pkill37

I quickly realized you can achieve this by doing something like

python download_archive.py --num-images k --filter malignant
python download_archive.py --num-images k --filter benign --offset k

So maybe you can add this to the documentation? I'm sure it's a fairly common use case, because it's very important that classes are balanced in classification problems.

pkill37 avatar Feb 02 '19 14:02 pkill37

Turns out the above does not work because of what I initially explained (which is also similar to the problem in #22). Some of the samples of the first downloaded class will be overwritten by the second. I had to download them to separate folders and carefully rename the sample's filenames so as to effectively merge the data.

If we change how the filenames are chosen, we can kill two birds with one stone.

pkill37 avatar Feb 02 '19 18:02 pkill37

I think that currently the names that the script gives the files are the same names they have in the website. Therefore different files should have different names, even via separate script runs.
Can you show an example for a scenario where two different files are given the same name?

GalAvineri avatar Feb 07 '19 08:02 GalAvineri