question_generation
question_generation copied to clipboard
Avoid ValueError: substring not found
in some cases, answers can't be found in the input text and ValueError would appear, add try except to avoid such errors.
in my case, substring was not found because ans were padded (like
Yes please! I have found the same error but hadn't fully worked out why just yet.
Here is a minimal example:
from pipelines import pipeline
# load in the multi task qa qg
MODEL = pipeline("multitask-qa-qg")
# problem text
text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.'
MODEL(text)
Full stack trace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-59-1ab007d28390> in <module>()
7 text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.'
8
----> 9 MODEL(text)
2 frames
/content/question_generation/pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers)
140 answer_text = answer_text.strip()
141
--> 142 ans_start_idx = sent.index(answer_text)
143
144 sent = f"{sent[:ans_start_idx]} <hl> {answer_text} <hl> {sent[ans_start_idx + len(answer_text): ]}"
ValueError: substring not found
Yes please! I have found the same error but hadn't fully worked out why just yet.
Here is a minimal example:
from pipelines import pipeline # load in the multi task qa qg MODEL = pipeline("multitask-qa-qg") # problem text text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.' MODEL(text)
Full stack trace:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-59-1ab007d28390> in <module>() 7 text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.' 8 ----> 9 MODEL(text) 2 frames /content/question_generation/pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers) 140 answer_text = answer_text.strip() 141 --> 142 ans_start_idx = sent.index(answer_text) 143 144 sent = f"{sent[:ans_start_idx]} <hl> {answer_text} <hl> {sent[ans_start_idx + len(answer_text): ]}" ValueError: substring not found
In this specific case, I found out that the error occurred because in sentence "Researchers need to do more studies before they can confirm the health benefits of stinging nettle.", the generated answer is "Do more studies" instead of "do more studies", in ans_start_idx = sent.index(answer_text) (line 142), this index function is case-sensitive, so indexing "Do more studies" will give you this value error.
Since the T5 model is uncased anyway, a simple solution would be replacing line 137 and line 140 in pipelines.py respectively with:
sent = sents[i].lower()
answer_text = answer_text.strip().lower()
This should solve your problem :)
Yes please! I have found the same error but hadn't fully worked out why just yet. Here is a minimal example:
from pipelines import pipeline # load in the multi task qa qg MODEL = pipeline("multitask-qa-qg") # problem text text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.' MODEL(text)
Full stack trace:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-59-1ab007d28390> in <module>() 7 text = 'The herb is generally safe to use. There is limited research to suggest that stinging nettle is an effective remedy. Researchers need to do more studies before they can confirm the health benefits of stinging nettle.' 8 ----> 9 MODEL(text) 2 frames /content/question_generation/pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers) 140 answer_text = answer_text.strip() 141 --> 142 ans_start_idx = sent.index(answer_text) 143 144 sent = f"{sent[:ans_start_idx]} <hl> {answer_text} <hl> {sent[ans_start_idx + len(answer_text): ]}" ValueError: substring not found
In this specific case, I found out that the error occurred because in sentence "Researchers need to do more studies before they can confirm the health benefits of stinging nettle.", the generated answer is "Do more studies" instead of "do more studies", in ans_start_idx = sent.index(answer_text) (line 142), this index function is case-sensitive, so indexing "Do more studies" will give you this value error.
Since the T5 model is uncased anyway, a simple solution would be replacing line 137 and line 140 in pipelines.py respectively with:
sent = sents[i].lower()
answer_text = answer_text.strip().lower()
This should solve your problem :)
The error was mainly because of the occurrence of the "<pad>" token at the beginning of some answers. Due to which the index of the answer couldn't be found in "sent".
So I added the following line at 141 to remove the
answer_text = re.sub("<pad> | <pad>", "", answer_text)
Post this addition, the code has been working on all the example that I've seen so far.
Cheers!