question_generation icon indicating copy to clipboard operation
question_generation copied to clipboard

ValueError: substring not found

Open vidyap-xgboost opened this issue 5 years ago • 6 comments

I ran the following code on colab:

from pipelines import pipeline
nlp = pipeline("question-generation")

text = """FIO Labs is an independent, privately-owned company with a global reach.
With the agility of a startup and the ability of a conglomerate, we help
businesses understand and adopt Artificial Intelligence & Data Security
technologies in the right framework and help them stay aligned with their
strategic objectives."""

nlp(text)

and I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-23-fdf5d062d391> in <module>()
----> 1 nlp(text)

1 frames
/content/question_generation/pipelines.py in _prepare_inputs_for_qg_from_answers_hl(self, sents, answers)
    140                 answer_text = answer_text.strip()
    141 
--> 142                 ans_start_idx = sent.index(answer_text)
    143 
    144                 sent = f"{sent[:ans_start_idx]} <hl> {answer_text} <hl> {sent[ans_start_idx + len(answer_text): ]}"

ValueError: substring not found

I don't understand why this error comes when I gave proper text.

This also occurred with

text8 = """Prashanth Bandi is one of the highly regarded consultants in the IT world, he \
is a Technology Evangelist with 18 years of consulting experience dealing with \
diverse problems and delivering technology solutions to complex business \
challenges. His adaptive nature, perseverance and genuine passion for \
technology makes him the torch bearer of our company.
"""

vidyap-xgboost avatar Aug 21 '20 09:08 vidyap-xgboost

Hi @vidyap-xgboost , this is a known issue and I'm working on the fix, see issue #11 . Sorry for inconvenience. Will let you know when it's fixed

patil-suraj avatar Aug 21 '20 10:08 patil-suraj

Thanks for the heads-up. Will look forward to the fix.

vidyap-xgboost avatar Aug 21 '20 10:08 vidyap-xgboost

For the time being:

if answer_text in sent: ans_start_idx = sent.index(answer_text) else: continue

will allow you to bypass the ValueError by skipping mismatched answer/sent pairs. in a test, it did continue to match answer/sent pairs beyond where I was getting an error. it had been throwing an error after (3) matches, and with the if statement it completed the task with a total of (10) answer/question pairs. not a fix, but will let you process an entire, complex text.

i was using James Baldwin's essay, If Black English Isn't a Language, Then Tell Me, What Is?, specifically for the purpose of stress testing. you can find it here:

(https://archive.nytimes.com/www.nytimes.com/books/98/03/29/specials/baldwin-english.html?_r=1&oref=slogin)

(apologies for the code format, i couldn't get it to break by line)

danielmoore19 avatar Aug 26 '20 19:08 danielmoore19

Hi @vidyap-xgboost , this is a known issue and I'm working on the fix, see issue #11 . Sorry for inconvenience. Will let you know when it's fixed

Hi @patil-suraj is this issue fixed now ?

ankitkr3 avatar Sep 22 '20 12:09 ankitkr3

Hi @vidyap-xgboost , this is a known issue and I'm working on the fix, see issue #11 . Sorry for inconvenience. Will let you know when it's fixed

Hi @patil-suraj is this issue fixed now ?

i would suggest using the if statement to work around. the issue is that sometimes the answer order gets out of place, the answer span and answer are not exact (one is Capital and the other is capital), or it actually creates an answer that does not appear the text. one of these is an issue with ordering what comes out of the I/O, one can be fixed by adding a .lower() to sent.index() and answer_text, and the last is an issue inside the model itself. thus the if statement is the only thing that will bypass all three errors. using the .lower() with the if statement will ensure you still get answers that appear in the text, but do not match in capitalization.

added - there is a more rare occurrence where an (s) gets added or dropped from the answer span.

danielmoore19 avatar Sep 23 '20 19:09 danielmoore19

When you use string_object.index(substring), it looks for the occurrence of substring in the string_object. If substring is present, the method returns the index at which the substring is present, otherwise, it throws ValueError: substring not found.

Using Python’s “in” operator

The simplest and fastest way to check whether a string contains a substring or not in Python is the “in” operator . This operator returns true if the string contains the characters, otherwise, it returns false .

str="Hello, World!"
print("World" in str)//output is  True

Python “in” operator takes two arguments, one on the left and one on the right, and returns True if the left argument string is contained within the right argument string. It is important to note that the “in” operator is case sensitive i.e, it will treat the Uppercase characters and Lowercase characters differently.

ronaldgevern avatar Aug 08 '22 05:08 ronaldgevern