deep-text-recognition-benchmark
deep-text-recognition-benchmark copied to clipboard
ValueError: num_samples should be a positive integer value, but got num_samples=0
When I create lmdb dataset with custom data(e.g. with trdg create multi words in per image), I followed the guide to create dataset, and I trained with below code:
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 train.py \ --exp_name CRNN_CTC_demo \ --select_data / \ --batch_ratio 1 \ --train_data lmdb_dataset/train/ --valid_data lmdb_dataset/val/ \ --Transformation TPS --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC
these are the print on the screen:
------ Use multi-GPU setting ------ if you stuck too long time with multi-GPU setting, try to set --workers 0 Filtering the images containing characters which are not in opt.character Filtering the images whose label is longer than opt.batch_max_length
dataset_root: lmdb_dataset/train/ opt.select_data: ['/'] opt.batch_ratio: ['1']
dataset_root: lmdb_dataset/train/ dataset: /
sub-directory: /. num samples: 0
num total samples of /: 0 x 1.0 (total_data_usage_ratio) = 0
num samples of / per batch: 768 x 1.0 (batch_ratio) = 768
Traceback (most recent call last):
File "train.py", line 317, in
It shows there is a error when loading data, I have no ideas why and how it happend. when I trained with one word in a image, it works well. Works: A word, for example "code". Bug: Multiple words, for example "I love coding"
By the way, I found there are some special characters between word, which will occur the same error.(e.g. a_b_c_d_e)
I have no ideas how to fix the problem. Can anyone help me? Thanks a lot!!!
You need to check this part.
#186
train.py line 25 print('Filtering the images containing characters which are not in opt.character') print('Filtering the images whose label is longer than opt.batch_max_length')
dataset.py line 168
By default, images containing characters which are not in opt.character are filtered.
line 190