deep-text-recognition-benchmark icon indicating copy to clipboard operation
deep-text-recognition-benchmark copied to clipboard

ValueError: num_samples should be a positive integer value, but got num_samples=0

Open youngsirsk opened this issue 3 years ago • 1 comments

When I create lmdb dataset with custom data(e.g. with trdg create multi words in per image), I followed the guide to create dataset, and I trained with below code: CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py \ --exp_name CRNN_CTC_demo \ --select_data / \ --batch_ratio 1 \ --train_data lmdb_dataset/train/ --valid_data lmdb_dataset/val/ \ --Transformation TPS --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC these are the print on the screen:

------ Use multi-GPU setting ------ if you stuck too long time with multi-GPU setting, try to set --workers 0 Filtering the images containing characters which are not in opt.character Filtering the images whose label is longer than opt.batch_max_length

dataset_root: lmdb_dataset/train/ opt.select_data: ['/'] opt.batch_ratio: ['1']

dataset_root: lmdb_dataset/train/ dataset: / sub-directory: /. num samples: 0 num total samples of /: 0 x 1.0 (total_data_usage_ratio) = 0 num samples of / per batch: 768 x 1.0 (batch_ratio) = 768 Traceback (most recent call last): File "train.py", line 317, in train(opt) File "train.py", line 31, in train train_dataset = Batch_Balanced_Dataset(opt) File "/home/yxy/deep-text-recognition-benchmark/dataset.py", line 67, in init collate_fn=_AlignCollate, pin_memory=True) File "/home/yxy/anaconda3/envs/crnn_ctc/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 262, in init sampler = RandomSampler(dataset, generator=generator) # type: ignore File "/home/yxy/anaconda3/envs/crnn_ctc/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 104, in init "value, but got num_samples={}".format(self.num_samples)) ValueError: num_samples should be a positive integer value, but got num_samples=0`

It shows there is a error when loading data, I have no ideas why and how it happend. when I trained with one word in a image, it works well. Works: A word, for example "code". Bug: Multiple words, for example "I love coding"

By the way, I found there are some special characters between word, which will occur the same error.(e.g. a_b_c_d_e)

I have no ideas how to fix the problem. Can anyone help me? Thanks a lot!!!

youngsirsk avatar Jun 28 '21 05:06 youngsirsk

You need to check this part.

#186

train.py line 25 print('Filtering the images containing characters which are not in opt.character') print('Filtering the images whose label is longer than opt.batch_max_length')

dataset.py line 168

By default, images containing characters which are not in opt.character are filtered.

line 190

We only train and evaluate on alphanumerics (or pre-defined character set in train.py)

kimlia545 avatar Jul 23 '21 02:07 kimlia545