cort icon indicating copy to clipboard operation
cort copied to clipboard

cort-predict-raw runs on python2 but not python3.5

Open bennytieu opened this issue 8 years ago • 4 comments

I was trying to run cort-predict-raw with following command:

python3.5 /usr/local/bin/cort-predict-raw -in ~/data/pilot_44_docs/*.txt -model models/model-pair-train.obj -extractor cort.coreference.approaches.mention_ranking.extract_substructures -perceptron cort.coreference.approaches.mention_ranking.RankingPerceptron -clusterer cort.coreference.clusterer.all_ante -corenlp ~/systems/stanford/stanford-corenlp-full-2016-10-31

and got the following error message:

Traceback (most recent call last): File "/usr/local/bin/cort-predict-raw", line 136, in doc.system_mentions = mention_extractor.extract_system_mentions(doc) File "/usr/local/lib/python3.5/dist-packages/cort/core/mention_extractor.py", line 36, in extract_system_mentions for span in __extract_system_mention_spans(document)] File "/usr/local/lib/python3.5/dist-packages/cort/core/mention_extractor.py", line 36, in for span in __extract_system_mention_spans(document)] File "/usr/local/lib/python3.5/dist-packages/cort/core/mentions.py", line 126, in from_document i, sentence_span = document.get_sentence_id_and_span(span) TypeError: 'NoneType' object is not iterable 2017-04-27 09:17:06,058 WARNING Killing subprocess 14154 2017-04-27 09:17:06,395 INFO Subprocess seems to be stopped, exit code -9

It works without a problem with python2 though. I'm running this on Ubuntu16.04.

bennytieu avatar Apr 27 '17 09:04 bennytieu

Can you isolate (and post) the document which causes the error message?

smartschat avatar Apr 27 '17 09:04 smartschat

I have isolated it to this string:

Contact for company: Sven Svensson 212 584 5242 [email protected].

I'm guessing it is the sequence of number that is at fault. Single instances of numbers are ok, for example, there are years like 2017 in other documents that are fine.

This example works:

Contact for company: Sven Svensson 584 5242 [email protected].

bennytieu avatar Apr 27 '17 10:04 bennytieu

I did some debugging, the first example is tokenized as ['Contact', 'for', 'company', ':', 'Sven', 'Svensson', '212Â\xa0584Â\xa05242', '[email protected]', '.']. I suspect that the TypeError happens because some representation I rely on handles the numbers as individual tokens. I will not be able to fix this right now, is using Python2 an option for you?

smartschat avatar Apr 27 '17 11:04 smartschat

I will try and run on Python2 in the meantime or just skip this special case. I'm doing a study on efficiency, so it would be most optimal to run it using Python3. Thank you for your quick reply!

bennytieu avatar Apr 27 '17 11:04 bennytieu