ConvoKit
ConvoKit copied to clipboard
Unable to add dependency parses
Hi Caleb @calebchiam,
I'm trying to perform politeness prediction using the example notebook given here. I run into some errors while adding dependency parses. Currently, I'm doing
from convokit import TextParser
wiki_corpus = Corpus(download("wikipedia-politeness-corpus"))
parser = TextParser(verbosity=1000)
And then when I do
wiki_corpus = parser.transform(wiki_corpus)
It gives me the following error:
StopIteration Traceback (most recent call last)
<ipython-input-12-cffb5c2034e3> in <module>
----> 1 wiki_corpus = parser.transform(wiki_corpus)
/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textProcessor.py in transform(self, corpus)
65 result = self.proc_fn(text_entry)
66 else:
---> 67 result = self.proc_fn(text_entry, self.aux_input)
68 if self.multi_outputs:
69 for res, out in zip(result, self.output_field):
/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_text_wrapper(self, text, aux_input)
74
75 def _process_text_wrapper(self, text, aux_input={}):
---> 76 return process_text(text, aux_input.get('mode','parse'),
77 aux_input.get('sent_tokenizer',None), aux_input.get('spacy_nlp',None))
78
/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in process_text(text, mode, sent_tokenizer, spacy_nlp)
124 offset = 0
125 for sent in sents:
--> 126 curr_sent = _process_sentence(sent, mode, offset)
127 sentences.append(curr_sent)
128 offset += len(curr_sent['toks'])
/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_sentence(sent_obj, mode, offset)
93 tokens = []
94 for token_obj in sent_obj:
---> 95 tokens.append(_process_token(token_obj, mode, offset))
96 if mode == 'parse':
97 return {'rt': sent_obj.root.i - offset, 'toks': tokens}
/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_token(token_obj, mode, offset)
86 token_info['dep'] = token_obj.dep_
87 if token_info['dep'] != 'ROOT':
---> 88 token_info['up'] = next(token_obj.ancestors).i - offset
89 token_info['dn'] = [x.i - offset for x in token_obj.children]
90 return token_info
StopIteration:
Although, when I do the transform using PolitenessStrategies, that works!!
from convokit import PolitenessStrategies
ps = PolitenessStrategies()
wiki_corpus = ps.transform(wiki_corpus, markers=True)
This works perfectly. Only the TextParser is giving errors. Any idea what the issue might be? Would be grateful if you could kindly have a look!
Thanks a lot, Abhinav
Hmm, based on the stack trace, this looks like an error caused by the spacy dependency. @tisjune, if you have the time, can you take a look at this and advise on how we should update the code?
Meanwhile, @BonJovi1, you can resolve this issue locally by uninstalling spacy and reinstalling spacy == 2.3.1. Make sure to re-download en_core_web_sm after this. I've tested this and it resolves the issue.
@BonJovi1 Looks like the problem first arises with the Spacy 3.2.0 release, so any release <=3.1.4 will work. Thanks for raising the issue -- we'll release a fix for this soon (or feel free to make a PR yourself).
Hi Caleb @calebchiam, thanks so much! Installing spacy == 2.3.1 did the trick and I'm now able to add dependency parses! :)
Thanks a bunch, Abhinav
Great to hear! We'll keep this issue open until we resolve it properly on our end.
Hi! We traced this issue back to some inconsistent behavior of SpaCy's dependency relation parser and raised an issue with them to confirm. We will keep this issue open until the bug is fixed.