ConvoKit icon indicating copy to clipboard operation
ConvoKit copied to clipboard

Unable to add dependency parses

Open BonJovi1 opened this issue 3 years ago • 5 comments
trafficstars

Hi Caleb @calebchiam,

I'm trying to perform politeness prediction using the example notebook given here. I run into some errors while adding dependency parses. Currently, I'm doing

from convokit import TextParser
wiki_corpus = Corpus(download("wikipedia-politeness-corpus"))
parser = TextParser(verbosity=1000)

And then when I do

wiki_corpus = parser.transform(wiki_corpus)

It gives me the following error:

StopIteration                             Traceback (most recent call last)
<ipython-input-12-cffb5c2034e3> in <module>
----> 1 wiki_corpus = parser.transform(wiki_corpus)

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textProcessor.py in transform(self, corpus)
     65                 result = self.proc_fn(text_entry)
     66             else:
---> 67                 result = self.proc_fn(text_entry, self.aux_input)
     68             if self.multi_outputs:
     69                 for res, out in zip(result, self.output_field):

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_text_wrapper(self, text, aux_input)
     74 
     75         def _process_text_wrapper(self, text, aux_input={}):
---> 76 		return process_text(text, aux_input.get('mode','parse'), 
     77 						aux_input.get('sent_tokenizer',None), aux_input.get('spacy_nlp',None))
     78 

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in process_text(text, mode, sent_tokenizer, spacy_nlp)
    124         offset = 0
    125         for sent in sents:
--> 126                 curr_sent = _process_sentence(sent, mode, offset)
    127                 sentences.append(curr_sent)
    128                 offset += len(curr_sent['toks'])

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_sentence(sent_obj, mode, offset)
     93         tokens = []
     94         for token_obj in sent_obj:
---> 95                 tokens.append(_process_token(token_obj, mode, offset))
     96         if mode == 'parse':
     97                 return {'rt': sent_obj.root.i - offset, 'toks': tokens}

/opt/anaconda3/lib/python3.8/site-packages/convokit/text_processing/textParser.py in _process_token(token_obj, mode, offset)
     86                 token_info['dep'] = token_obj.dep_
     87                 if token_info['dep'] != 'ROOT':
---> 88                         token_info['up'] = next(token_obj.ancestors).i - offset
     89                 token_info['dn'] = [x.i - offset for x in token_obj.children]
     90         return token_info

StopIteration: 

Although, when I do the transform using PolitenessStrategies, that works!!

from convokit import PolitenessStrategies
ps = PolitenessStrategies()
wiki_corpus = ps.transform(wiki_corpus, markers=True)

This works perfectly. Only the TextParser is giving errors. Any idea what the issue might be? Would be grateful if you could kindly have a look!

Thanks a lot, Abhinav

BonJovi1 avatar Dec 13 '21 19:12 BonJovi1

Hmm, based on the stack trace, this looks like an error caused by the spacy dependency. @tisjune, if you have the time, can you take a look at this and advise on how we should update the code?

Meanwhile, @BonJovi1, you can resolve this issue locally by uninstalling spacy and reinstalling spacy == 2.3.1. Make sure to re-download en_core_web_sm after this. I've tested this and it resolves the issue.

calebchiam avatar Dec 13 '21 19:12 calebchiam

@BonJovi1 Looks like the problem first arises with the Spacy 3.2.0 release, so any release <=3.1.4 will work. Thanks for raising the issue -- we'll release a fix for this soon (or feel free to make a PR yourself).

calebchiam avatar Dec 14 '21 02:12 calebchiam

Hi Caleb @calebchiam, thanks so much! Installing spacy == 2.3.1 did the trick and I'm now able to add dependency parses! :)

Thanks a bunch, Abhinav

BonJovi1 avatar Dec 14 '21 06:12 BonJovi1

Great to hear! We'll keep this issue open until we resolve it properly on our end.

calebchiam avatar Dec 14 '21 06:12 calebchiam

Hi! We traced this issue back to some inconsistent behavior of SpaCy's dependency relation parser and raised an issue with them to confirm. We will keep this issue open until the bug is fixed.

khonzoda avatar Jan 10 '22 18:01 khonzoda