bert BERT SQuAD 2 fails on specific types of questions

I have found a few cases where run_squad on tensorflow consistently fails.

Here are the cases I can identify:

questions containing embedded commas or apostrophes
questions containing abbreviations (e.g., EU and PBC)
questions missing a preposition (and similar grammatical problems?)

Ten phrases identified as problematic (amount 605 phrases I've automatically extracted):

Aged wood crossarms and X Braces of H Frame structures reach end of life loose strength and fail
BGE LNG Propane Plant Explosion
EU LNG Propane Plant Leak
Equipment PCB Spill greater than 50 ppm on Distribution System in Waterway or Sensitive Area
UCE 46 Transmission Overhead Reliability Related to Wood Structures
Uneconomic Coal Plants Remain in Stack
Uneconomic coal remains in stack
Unfavorable Earned ROEs at BGE
Unfavorable Earned ROEs at PECO
Vault Roofs Chicago

The questions are each built on a phrase X (like the ones above) are constructed asking the question:

What causes X?
What does X cause?

So,

What does BGE LNG Propane Plant Explosion cause?
What causes BGE LNG Propane Plant Explosion?

Aside from the abbreviations and discarding the oddball Vault Roofs Chicago, these two fail:

What causes  Uneconomic Coal Plants Remain in Stack?
What does Uneconomic coal remains in stack cause?

(The other two combinations of these phrases also fail.)

They are repaired by adding a preposition and adjusting:

What causes Uneconomic Coal Plants to Remain in Stack?
What does Uneconomic coal to remain in stack cause?

There's something wrong in the processing that causes run_squad to terminate without completing questionnaire processing. The failure mode always looks something like this:

Traceback:

Traceback (most recent call last):
  File "run_squad.py", line 1286, in <module>
    tf.app.run()
  File "/home/.../site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "run_squad.py", line 1279, in main
    output_nbest_file, output_null_log_odds_file)
  File "run_squad.py", line 909, in write_predictions
    score_diff = score_null - best_non_null_entry.start_logit - (
AttributeError: 'NoneType' object has no attribute 'start_logit'

Specifically, a null value for **best_non_null_entry.start_logit **.

The error occurs with a null best_non_null_entry at line 906 in write_predictions, which appears to be addressed in #476 and #477.

As mentioned above, these questions are generated from a large number of phrases, so it's impractical to manually or automatically detect and remove all errant cases (abbreviations and punctuation, yes. missing prepositions and similar case, not so much).

It would be great to understand the underlying cause for these failures (and a fix). Failing that, some kind of non-terminating exception handling would suffice.

Feb 05 '19 20:02 mfeblowitz

@mfeblowitz Thanks for reporting and debugging this. I am running into the same issue, I am just curious why only few of us running into this issue and based on the age of this issue report there is no movement at all on this - perhaps make it even rare occurrence??

How did you resolve it, am sorry am bit unclear - did you remove this example or fixed the code to get around it?

Appreciate the response. Anshoo

May 01 '19 13:05 anshoomehra

I was only able to make it go away by purging the apostrophes and/or the commas from the text.

May 14 '19 22:05 mfeblowitz

OK - so it appears that there might be multiple reasons why this failure comes up: complex sentences, unhandled special characters in the text (including embedded apostrophes or commas, ...

It does appear that some exception happens, and rather than moving on to the next question, the processing stops. I'm thinking that some additional exception processing can be inserted, and that there should be a flag that indicates that the errant question should be reported and skipped.

I have switched to a different approach whereby I ask each question in a separate call to bert, rather than bundling them up into one (huge) questionnaire, and then catching the exceptions and moving to the next. Also, capturing the sentence and the exception in the log, I'll be able to follow thus up with characterizations of what leads to this failure.

Jun 13 '19 19:06 mfeblowitz

Edits to the original problem statement were made after the comments above.

Jun 14 '19 20:06 mfeblowitz

Question to @Anton-Velikodnyy RE #477: Does this this fix address the situation above, or merely prevent a mid-processing crash? The latter is good, the former would also be good.

Jun 19 '19 22:06 mfeblowitz

I tried two ways to solve the problem, but failed,I wonder why. Methods a: ` if FLAGS.version_2_with_negative: if "" not in seen_predictions: nbest.append( _NbestPrediction( text="", start_logit=null_start_logit, end_logit=null_end_logit))

    #In very rare edge cases we could only have single null prediction.
    #So we just create a nonce prediction in this case to avoid failure.

best_non_null_entry.end_logit)scores_diff_json[example.qas_id] = score_diff的错误：AttributeError: 'NoneType' object has no attribute 'start_logit' if len(nbest)==1: nbest.insert(0, _NbestPrediction(text='empty',start_logit=0.0,end_logit=0.0))

# In very rare edge cases we could have no valid predictions. So we
# just create a nonce prediction in this case to avoid failure.
 
if not nbest:`

Methods b: scores_diff_json[example.qas_id] = score_diff

  if best_non_null_entry:
    score_diff = score_null - best_non_null_entry.start_logit - (best_non_null_entry.end_logit)
    scores_diff_json[example.qas_id] = score_diff
  else:
    #all n best entries are null,we assign a higher diff than threshold
    score_diff = FLAGS.null_score_diff_threshold + 1.0

  if score_diff > FLAGS.null_score_diff_threshold:
    all_predictions[example.qas_id] = ""

Jun 21 '19 06:06 zjjhuihui

I faced the same problem. I just check if anyone can fix this error. I make a try-catch command on that error but I don't know the performance will be affected or not.

# predict "" iff the null score - the score of best non-null > threshold
  try:
    score_diff = score_null - best_non_null_entry.start_logit - (best_non_null_entry.end_logit)
  except:
    score_diff = score_null
    
  scores_diff_json[example.qas_id] = score_diff
  if score_diff > FLAGS.null_score_diff_threshold:
    all_predictions[example.qas_id] = ""
  else:
    all_predictions[example.qas_id] = best_non_null_entry.text

May 18 '21 15:05 alphamaviwiki

Having same issue in 2022. Used extra try-catch near line 906 to run it, but don't know whether it affects the result or not

# predict "" iff the null score - the score of best non-null > threshold
try:
  score_diff = score_null - best_non_null_entry.start_logit - (best_non_null_entry.end_logit)
except:
  score_diff = score_null

scores_diff_json[example.qas_id] = score_diff
if score_diff > FLAGS.null_score_diff_threshold:
  all_predictions[example.qas_id] = ""
else:
  try:
    all_predictions[example.qas_id] = best_non_null_entry.text
  except:
    all_predictions[example.qas_id] = ""

Feb 08 '22 07:02 lackhole

Check region networkDian Jaka WidiawanOn Feb 8, 2022 2:26 PM, Lee Yong Gyu @.***> wrote: Having same issue in 2022

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

Feb 08 '22 11:02 IntelOSt

kill fisch biden american presidentDian Jaka WidiawanOn Feb 8, 2022 6:22 PM, Steven Ian Denis @.***> wrote:

Check region networkDian Jaka WidiawanOn Feb 8, 2022 2:26 PM, Lee Yong Gyu @.***> wrote:

Having same issue in 2022

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you are subscribed to this thread.Message ID: @.***>

Feb 08 '22 18:02 IntelOSt

BERT SQuAD 2 fails on specific types of questions - Found New Info...