ISIC-Archive-Downloader
ISIC-Archive-Downloader copied to clipboard
Download k samples of each class
Right now if you wanted to download k samples of each class (malignant and benign) you would have to manually download the malignants first
python download_archive.py --num-images k --filter malignant
And then in a separate directory download the benigns
python download_archive.py --num-images k --filter benign
Otherwise you'd overwrite some of the images. And because some images will have the same filenames, you have to do some preprocessing to rename them all consistently before merging them together.
It would be nice if the script was able to do this in one go.
I quickly realized you can achieve this by doing something like
python download_archive.py --num-images k --filter malignant
python download_archive.py --num-images k --filter benign --offset k
So maybe you can add this to the documentation? I'm sure it's a fairly common use case, because it's very important that classes are balanced in classification problems.
Turns out the above does not work because of what I initially explained (which is also similar to the problem in #22). Some of the samples of the first downloaded class will be overwritten by the second. I had to download them to separate folders and carefully rename the sample's filenames so as to effectively merge the data.
If we change how the filenames are chosen, we can kill two birds with one stone.
I think that currently the names that the script gives the files are the same names they have in the website. Therefore different files should have different names, even via separate script runs.
Can you show an example for a scenario where two different files are given the same name?