paraphrase-id-tensorflow
paraphrase-id-tensorflow copied to clipboard
RuntimeError: Unrecognized line format
Hi, i am running the biMPM model to predict, getting the following result:
Traceback (most recent call last):
File "run_bimpm.py", line 267, in <module>
66%|███████████████████████████████████████████████████████████████▊ | 2345735/3563475 [09:22<04:51, 4172.73it/s] main()
File "run_bimpm.py", line 160, in main
mode="word+character")
File "../../duplicate_questions/data/data_manager.py", line 390, in get_test_data_from_file
self.instance_type)
File "../../duplicate_questions/data/dataset.py", line 145, in read_from_file
return TextDataset.read_from_lines(lines, instance_class)
File "../../duplicate_questions/data/dataset.py", line 177, in read_from_lines
instances = [instance_class.read_from_line(line) for line in tqdm(lines)]
File "../../duplicate_questions/data/dataset.py", line 177, in <listcomp>
instances = [instance_class.read_from_line(line) for line in tqdm(lines)]
File "../../duplicate_questions/data/instances/sts_instance.py", line 118, in read_from_line
raise RuntimeError("Unrecognized line format: " + line)
RuntimeError: Unrecognized line format: "life in dublin?"""
Now the temporary workout is i delete the else
branch, so it will skip unrecognized line
yeah, that isn't a proper NLI instance, right? It expects [id],[question1],[question2]
.
Skipping is an acceptable workaround, but I think the better solution would be to reformat the data you use :)
I use quora question dataset from kaggle, which is the same as yours.
I find that line in the test_final.csv
"2162206","What is the minimum salary needed to live a decent life in Malaysia?","What is the minimum salary needed to live a decent life in dublin?"
It is proper instance .
I am so confused since this instance and code both are right.
By the way, i shouldn't delete otherwise the kaggle won't score it because the number of rows is not proper.