knowledge_graph_from_unstructured_text
knowledge_graph_from_unstructured_text copied to clipboard
IndexError: list index out of range while coreference
Traceback (most recent call last):
File "knowledge_graph.py", line 292, in <module>
main()
File "knowledge_graph.py", line 287, in main
doc = resolve_coreferences(doc,stanford_core_nlp_path,named_entities,verbose)
File "knowledge_graph.py", line 217, in resolve_coreferences
result = coref_obj.resolve_coreferences(corefs,doc,ner,verbose)
File "knowledge_graph.py", line 200, in resolve_coreferences
replaced_sent = words[i] + " "+ replaced_sent
IndexError: list index out of range
Data file added for reproducing the error input_data (1).txt
Primary analysis suggests: The file has tokens like: " North-East", and "third-largest", stanford tokenizer for coreference splits across hyphen, while nltk does does not. So, as per , nltk the token length of corresponding sentence is 37, which does not match co-reference indices (with 41 tokens) ['North', '-','East',third','-','largest']
Also having the same Issue..Anyone can help please..?
The issue is mainly due to use of different tokenizer. Two different tokenizer are used , specific problems arise while handling with "-" or special characters. Use Spacy tokenizer instead of nltk or white space.
Thanks very much, that helped