textcomplexity icon indicating copy to clipboard operation
textcomplexity copied to clipboard

TypeError

Open melissasunnivahill opened this issue 1 year ago • 4 comments

I've been attempting to run analyses from the textcomplexity library but keep getting the following error:

TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'

Here's a deeper look at what's happening: !txtcomplexity -i conllu 'output.conllu'

Traceback (most recent call last): File "/usr/local/bin/txtcomplexity", line 12, in textcomplexity.cli.main() File "/usr/local/lib/python3.10/dist-packages/textcomplexity/cli.py", line 194, in main sentences, graphs = zip(*conllu.read_conllu_sentences(f, ignore_case=args.ignore_case)) File "/usr/local/lib/python3.10/dist-packages/textcomplexity/utils/conllu.py", line 16, in read_conllu_sentences for sentence, sent_id in _read_conllu(f, ignore_case): File "/usr/local/lib/python3.10/dist-packages/textcomplexity/utils/conllu.py", line 66, in _read_conllu sentence.append(UdToken(*fields))

TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'

Is this related to an error in my conllu file or how I'm using the textcomplexity library? Any help would be much appreciated! :)

melissasunnivahill avatar Jan 11 '24 23:01 melissasunnivahill

Could you share the first few couple of lines from your input file?

tsproisl avatar Jan 22 '24 07:01 tsproisl

Sure! Here is what the first few lines of my conllu file looks like:

1 # # X XX _ 2 dep _ _ 2 Mixtures mixture VERB VBZ _ 2 ROOT _ _ 3

SPACE	_SP	_	2	dep	_	_

4 The the DET DT _ 6 det _ _ 5 next next ADJ JJ _ 6 amod _ _ 6 time time NOUN NN _ 2 npadvmod _ _ 7 you you PRON PRP _ 8 nsubj _ _ 8 are be AUX VBP _ 6 relcl _ _ 9 at at ADP IN _ 8 prep _ _ 10 the the DET DT _ 11 det _ _ 11 beach beach NOUN NN _ 9 pobj _ _ 12 , , PUNCT , _ 2 punct _ _ 13 pick pick VERB VB _ 2 conj _ _ 14 up up ADP RP _ 13 prt _ _ 15 a a DET DT _ 16 det _ _ 16 handful handful NOUN NN _ 13 dobj _ _ 17 of of ADP IN _ 16 prep _ _ 18 sand sand NOUN NN _ 17 pobj _ _ 19 . . PUNCT . _ 2 punct _ _ 20

melissasunnivahill avatar Jan 22 '24 19:01 melissasunnivahill

I don’t know if GitHub messed with the formatting, but it seems like the third token is a newline character? The txtcomplexity tool assumes that token information is on a single line, i.e. it cannot deal with tokens that contain literal newline characters and therefore span multiple lines. Out of curiosity: Do you happen to know how that file was created?

tsproisl avatar Jan 22 '24 20:01 tsproisl

That makes sense, thanks! And my PhD advisor wrote a python program to convert txt files to conllu format; happy to add the code if you're interested in looking at it :)

melissasunnivahill avatar Jan 23 '24 06:01 melissasunnivahill