knowledge_graph_from_unstructured_text icon indicating copy to clipboard operation
knowledge_graph_from_unstructured_text copied to clipboard

IndexError: list index out of range while coreference

Open anindyasdas opened this issue 4 years ago • 3 comments

Traceback (most recent call last):
  File "knowledge_graph.py", line 292, in <module>
    main()
  File "knowledge_graph.py", line 287, in main
    doc = resolve_coreferences(doc,stanford_core_nlp_path,named_entities,verbose)
  File "knowledge_graph.py", line 217, in resolve_coreferences
    result = coref_obj.resolve_coreferences(corefs,doc,ner,verbose)
  File "knowledge_graph.py", line 200, in resolve_coreferences
    replaced_sent = words[i] + " "+ replaced_sent
IndexError: list index out of range


Data file added for reproducing the error input_data (1).txt

Primary analysis suggests: The file has tokens like: " North-East", and "third-largest", stanford tokenizer for coreference splits across hyphen, while nltk does does not. So, as per , nltk the token length of corresponding sentence is 37, which does not match co-reference indices (with 41 tokens) ['North', '-','East',third','-','largest']

anindyasdas avatar Nov 25 '20 15:11 anindyasdas

Also having the same Issue..Anyone can help please..?

Kojo7 avatar May 18 '21 18:05 Kojo7

The issue is mainly due to use of different tokenizer. Two different tokenizer are used , specific problems arise while handling with "-" or special characters. Use Spacy tokenizer instead of nltk or white space.

anindyasdas avatar May 19 '21 16:05 anindyasdas

Thanks very much, that helped

Kojo7 avatar May 19 '21 18:05 Kojo7