udify
udify copied to clipboard
UdifyTextPredictor fails when output_conllu=true
I'm feeding this raw input to the predict.py - "Il est assez sûr de lui pour danser et chanter en public ." by setting --raw_text flag and since I want the output in CoNLLU format, I've set output_conllu=True in UdifyTextPredictor.
The dump_line in UdifyPredictor is erroring out.
File udify/udify/predictors/text_predictor.py", line 63, in dump_line
return self.predictor.dump_line(outputs)
File udify/udify/predictors/predictor.py", line 82, in dump_line
multiword_ids = [[id] + [int(x) for x in id.split("-")] for id in outputs["multiword_ids"]]
File udify/udify/predictors/predictor.py", line 82, in
Could you please take a look?
Thanks, Ranjita
Sorry for the late reply. I think there might be a bug in how the multiword IDs are handled. In this case, you don't have any multiword IDs because you input raw text. Can you try commenting out the block starting with if outputs["multiword_ids"]:
?
I can relate to the same problem, even with the suggested solution the error persists.
I also came across this issue.
The problem is that outputs["multiword_ids"]
is "None"
(str), not None
. Due to this, the condition if outputs["multiword_ids"]:
is always True
even if there's no multiword ids actually.
That is, even if there's no multiword in a predicted tree, the following code block is executed, causing Error because it tries to apply int()
to string 'N'
, the first letter of "None"
.
https://github.com/Hyperparticle/udify/blob/18d63ac1b2da5a1afea58f317ade79bc84910450/udify/predictors/predictor.py#L81-L84
I think the error should be removed by commenting out these four lines.
But actually I found another problem... outputs["ids"]
is also "None"
(str) somehow, generating weird conllu as a result:
N Un uno DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 2 det _ _
o oppioide oppioide NOUN _ Gender=Masc|Number=Sing 6 nsubj _ _
n è essere AUX _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 6 cop _ _
e un uno DET _ Definite=Ind|Gender=Masc|Number=Sing|PronType=Art 6 det _ _
We can temporarily fix it by using instead the list with the length of sentence [1,2,...,n]
, but I think the essential issue is that the outputs['ids']
maps to an unexpected value..
And this might be related to the issue I posted as well (not for sure). Could you check it?