paraphrase-id-tensorflow icon indicating copy to clipboard operation
paraphrase-id-tensorflow copied to clipboard

RuntimeError: Unrecognized line format

Open p-null opened this issue 6 years ago • 2 comments

Hi, i am running the biMPM model to predict, getting the following result:

Traceback (most recent call last):
  File "run_bimpm.py", line 267, in <module>
 66%|███████████████████████████████████████████████████████████████▊                                 | 2345735/3563475 [09:22<04:51, 4172.73it/s]    main()
  File "run_bimpm.py", line 160, in main
    mode="word+character")
  File "../../duplicate_questions/data/data_manager.py", line 390, in get_test_data_from_file
    self.instance_type)
  File "../../duplicate_questions/data/dataset.py", line 145, in read_from_file
    return TextDataset.read_from_lines(lines, instance_class)
  File "../../duplicate_questions/data/dataset.py", line 177, in read_from_lines
    instances = [instance_class.read_from_line(line) for line in tqdm(lines)]
  File "../../duplicate_questions/data/dataset.py", line 177, in <listcomp>
    instances = [instance_class.read_from_line(line) for line in tqdm(lines)]
  File "../../duplicate_questions/data/instances/sts_instance.py", line 118, in read_from_line
    raise RuntimeError("Unrecognized line format: " + line)
RuntimeError: Unrecognized line format: "life in dublin?"""

Now the temporary workout is i delete the else branch, so it will skip unrecognized line

p-null avatar Jun 28 '18 23:06 p-null

yeah, that isn't a proper NLI instance, right? It expects [id],[question1],[question2].

Skipping is an acceptable workaround, but I think the better solution would be to reformat the data you use :)

nelson-liu avatar Jun 28 '18 23:06 nelson-liu

I use quora question dataset from kaggle, which is the same as yours. I find that line in the test_final.csv "2162206","What is the minimum salary needed to live a decent life in Malaysia?","What is the minimum salary needed to live a decent life in dublin?" It is proper instance . I am so confused since this instance and code both are right. By the way, i shouldn't delete otherwise the kaggle won't score it because the number of rows is not proper.

p-null avatar Jun 29 '18 12:06 p-null