udify icon indicating copy to clipboard operation
udify copied to clipboard

UdifyTextPredictor fails when output_conllu=true

Open ranjita-naik opened this issue 4 years ago • 4 comments

I'm feeding this raw input to the predict.py - "Il est assez sûr de lui pour danser et chanter en public ." by setting --raw_text flag and since I want the output in CoNLLU format, I've set output_conllu=True in UdifyTextPredictor.

The dump_line in UdifyPredictor is erroring out.

File udify/udify/predictors/text_predictor.py", line 63, in dump_line return self.predictor.dump_line(outputs) File udify/udify/predictors/predictor.py", line 82, in dump_line multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] File udify/udify/predictors/predictor.py", line 82, in multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] File udify/udify/predictors/predictor.py", line 82, in multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]] ValueError: invalid literal for int() with base 10: 'N'

Could you please take a look?

Thanks, Ranjita

ranjita-naik avatar Jan 14 '21 18:01 ranjita-naik

Sorry for the late reply. I think there might be a bug in how the multiword IDs are handled. In this case, you don't have any multiword IDs because you input raw text. Can you try commenting out the block starting with if outputs["multiword_ids"]:?

Hyperparticle avatar Feb 06 '21 19:02 Hyperparticle

I can relate to the same problem, even with the suggested solution the error persists.

huberemanuel avatar Jun 11 '21 00:06 huberemanuel

I also came across this issue. The problem is that outputs["multiword_ids"] is "None" (str), not None. Due to this, the condition if outputs["multiword_ids"]: is always True even if there's no multiword ids actually. That is, even if there's no multiword in a predicted tree, the following code block is executed, causing Error because it tries to apply int() to string 'N', the first letter of "None".

https://github.com/Hyperparticle/udify/blob/18d63ac1b2da5a1afea58f317ade79bc84910450/udify/predictors/predictor.py#L81-L84

I think the error should be removed by commenting out these four lines.

gifdog97 avatar Oct 31 '21 08:10 gifdog97

But actually I found another problem... outputs["ids"] is also "None" (str) somehow, generating weird conllu as a result:

N	Un	uno	DET	_	Definite=Ind|Gender=Masc|Number=Sing|PronType=Art	2	det	_	_
o	oppioide	oppioide	NOUN	_	Gender=Masc|Number=Sing	6	nsubj	_	_
n	è	essere	AUX	_	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	6	cop	_	_
e	un	uno	DET	_	Definite=Ind|Gender=Masc|Number=Sing|PronType=Art	6	det	_	_

We can temporarily fix it by using instead the list with the length of sentence [1,2,...,n], but I think the essential issue is that the outputs['ids'] maps to an unexpected value.. And this might be related to the issue I posted as well (not for sure). Could you check it?

gifdog97 avatar Oct 31 '21 08:10 gifdog97