eda_nlp icon indicating copy to clipboard operation
eda_nlp copied to clipboard

ValueError: empty range for randrange() (0,0, 0)

Open genbei opened this issue 4 years ago • 5 comments

I have processed the data according to the data format you said,Here are my running scripts and errors

python code/augment.py --input=train_50w.en --output=train_50w._augmented.txt --num_aug=1 --alpha_sr=0.05 --alpha_rd=0.05 --alpha_ri=0 --alpha_rs=0.05

Traceback (most recent call last): File "code/augment.py", line 75, in gen_eda(args.input, output, alpha_sr=alpha_sr, alpha_ri=alpha_ri, alpha_rs=alpha_rs, alpha_rd=alpha_rd, num_aug=num_aug) File "code/augment.py", line 64, in gen_eda aug_sentences = eda(sentence, alpha_sr=alpha_sr, alpha_ri=alpha_ri, alpha_rs=alpha_rs, p_rd=alpha_rd, num_aug=num_aug) File "/home/tool/eda_nlp-master/code/eda.py", line 201, in eda a_words = random_swap(words, n_rs) File "/home/tool/eda_nlp-master/code/eda.py", line 130, in random_swap new_words = swap_word(new_words) File "/home/tool/eda_nlp-master/code/eda.py", line 134, in swap_word random_idx_1 = random.randint(0, len(new_words)-1) File "/home/miniconda3/envs/eda/lib/python3.6/random.py", line 221, in randint return self.randrange(a, b+1) File "/home/miniconda3/envs/eda/lib/python3.6/random.py", line 199, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,0, 0)

genbei avatar Jun 03 '21 08:06 genbei

same issue here:

python code/augment.py --input=data/train_original.txt --num_aug=15 --alpha_sr=0.1

Traceback (most recent call last): File "code/augment.py", line 75, in gen_eda(args.input, output, alpha_sr=alpha_sr, alpha_ri=alpha_ri, alpha_rs=alpha_rs, alpha_rd=alpha_rd, num_aug=num_aug) File "code/augment.py", line 64, in gen_eda aug_sentences = eda(sentence, alpha_sr=alpha_sr, alpha_ri=alpha_ri, alpha_rs=alpha_rs, p_rd=alpha_rd, num_aug=num_aug) File "/Users/bernardogarcia/GitHub/eda_nlp/code/eda.py", line 194, in eda a_words = random_insertion(words, n_ri) File "/Users/bernardogarcia/GitHub/eda_nlp/code/eda.py", line 153, in random_insertion add_word(new_words) File "/Users/bernardogarcia/GitHub/eda_nlp/code/eda.py", line 160, in add_word random_word = new_words[random.randint(0, len(new_words)-1)] File "/Users/bernardogarcia/opt/anaconda3/envs/nlp-news_filter/lib/python3.7/random.py", line 222, in randint return self.randrange(a, b+1) File "/Users/bernardogarcia/opt/anaconda3/envs/nlp-news_filter/lib/python3.7/random.py", line 200, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,0, 0)

bergr7 avatar Jun 08 '21 13:06 bergr7

In code/eda.py, the main function eda starts with below in line 175

sentence = get_only_chars(sentence)

get_only_chars function performs preprocessing to remove non-alphabetic characters from text. Therefore, if you input text data consisting of only non-alphabetic characters, len(words)-1 becomes -1 in the code below in line 117 and etc. and an error occurs.

rand_int = random.randint(0, len(words)-1)

If you input text data that consists only of non-alphabetic characters, you can avoid this error by modifying the get_only_chars function in line 45, so that the data is excluded from removal as follows.

if char in 'qwertyuiopasdfghjklzxcvbnm123456789(). ':

msub0310 avatar Dec 04 '21 10:12 msub0310

i have same problem,big probability is your data problem. if you are sentence is null or particular token ,like "------". check your data.

zhoujiangfeng avatar Jan 05 '22 07:01 zhoujiangfeng

I also have this problem, could this problem refers to dataset? because my dataset has three columns but the dataset in this repository has two columns. would you please help me.

shakiba-bakhtiari avatar Jun 04 '22 15:06 shakiba-bakhtiari