high-fidelity-generative-compression
high-fidelity-generative-compression copied to clipboard
How to use a custom dataset?
I've changed the default_config.py to a custom folder with images: folder/path |----/image001.jpg |----/image002.jpg ...
But it returned me
ValueError: num_samples should be a positive integer value, but got num_samples=0
Posting the full stacktrace would help. If you rename the dataset in default_config.py under DatasetPaths you must also create a new dataset with corresponding name which inherits from the BaseDataset class in src/helpers/datasets.py. There are a few examples in that file.
Actually I think the problem is that the module torch.utils.data is not finding the images in the folder, so it is returning num_samples=0. What is the directory structure of OpenImages dataset?
If you post the stacktrace it would be easier to diagnose. If you look at the parent BaseDataset class you'll notice the dataset directory should contain train/ and test/ subfolders.
In default_config.py:
class DatasetPaths(object):
OPENIMAGES = '/mnt/ramdisk/root_folder'
CITYSCAPES = ''
JETS = ''
class args(object):
dataset = Datasets.OPENIMAGES
dataset_path = DatasetPaths.OPENIMAGES
The structure is:
/mnt/ramdisk/root_folder
|----/train
|--------/image001.png
|----/test
|--------/image001.png
|----/val
|--------/image001.png
Traceback (most recent call last):
File "train.py", line 322, in <module>
normalize=args.normalize_input_image)
File "/home/user/anaconda3/envs/HIFIC/high-fidelity-generative-compression-master/src/helpers/datasets.py", line 75, in get_dataloaders
pin_memory=pin_memory)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 224, in __init__
sampler = RandomSampler(dataset, generator=generator)
File "/home/user/anaconda3/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 96, in __init__
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
I think the problem was the following line:
self.imgs = glob.glob(os.path.join(data_dir, '*.jpg'))
which would only get JPGs. I pushed a fix to master to account for PNGs as well. Let me know if you still have issues.
Unfortunately that didn't work, same error. What is the absolute path expected?
I'm not using the original openimages dataset, its a custom dataset using a custom path, but I did not created a new class in datasets.py
I'm using OPENIMAGES = '/mnt/ramdisk/openimages' but the files are custom, all inside subfolders [train, test, val], all files are PNG.
The code files are in another path.
I get this error. But I try to make "val" folder for "validation" folder. The error disappears. I get new error : "out of memory" although I try to make "batch_size = 2" and "crop_size = 64". Could you post your default_config.py if you can run train.py.
@QLaHPD The code does not find the dataset. You can print datapath in BaseDataSet, and derived dataset to make sure.
I encountered this error and solved it.
Note that this error may exist even if the model is able to find the dataset. Most of the people say there is a problem locating the dataset but this is not always the case.
Like me, I think you are using a small dataset where there is no enough samples for each iteration. Here are more details.
The default patch size is 8. Assume you set the --n_steps parameter to 1e6. This means there are 1 million (1,000,000) iteration where each iteration requires 8 samples. Thus, you should have 8 million samples (8 * 1,000,000). If you have lower number of samples than 8 million, then the following error occurs:
ValueError: num_samples should be a positive integer value, but got num_samples=0
To solve it, you can set a smaller value to the --n_steps parameter. Try 1 for example: --n_steps 1:
python train.py --model_type compression --regime low --n_steps 1
I hope this helps.
I suggest you write your own dataloader and prepare a cropped image dataset so you don't need to crop images everytime.
Yes, writing own dataloader solve this issue.