TypeError
I've been attempting to run analyses from the textcomplexity library but keep getting the following error:
TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'
Here's a deeper look at what's happening:
!txtcomplexity -i conllu 'output.conllu'
Traceback (most recent call last):
File "/usr/local/bin/txtcomplexity", line 12, in
TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'
Is this related to an error in my conllu file or how I'm using the textcomplexity library? Any help would be much appreciated! :)
Could you share the first few couple of lines from your input file?
Sure! Here is what the first few lines of my conllu file looks like:
1 # # X XX _ 2 dep _ _ 2 Mixtures mixture VERB VBZ _ 2 ROOT _ _ 3
SPACE _SP _ 2 dep _ _
4 The the DET DT _ 6 det _ _ 5 next next ADJ JJ _ 6 amod _ _ 6 time time NOUN NN _ 2 npadvmod _ _ 7 you you PRON PRP _ 8 nsubj _ _ 8 are be AUX VBP _ 6 relcl _ _ 9 at at ADP IN _ 8 prep _ _ 10 the the DET DT _ 11 det _ _ 11 beach beach NOUN NN _ 9 pobj _ _ 12 , , PUNCT , _ 2 punct _ _ 13 pick pick VERB VB _ 2 conj _ _ 14 up up ADP RP _ 13 prt _ _ 15 a a DET DT _ 16 det _ _ 16 handful handful NOUN NN _ 13 dobj _ _ 17 of of ADP IN _ 16 prep _ _ 18 sand sand NOUN NN _ 17 pobj _ _ 19 . . PUNCT . _ 2 punct _ _ 20
I don’t know if GitHub messed with the formatting, but it seems like the third token is a newline character? The txtcomplexity tool assumes that token information is on a single line, i.e. it cannot deal with tokens that contain literal newline characters and therefore span multiple lines. Out of curiosity: Do you happen to know how that file was created?
That makes sense, thanks! And my PhD advisor wrote a python program to convert txt files to conllu format; happy to add the code if you're interested in looking at it :)