gum icon indicating copy to clipboard operation
gum copied to clipboard

Tagged as nominal, should actually be VERB?

Open nschneid opened this issue 2 years ago • 4 comments

http://match.grew.fr/?corpus=UD_English-GUM@dev&custom=61f88fb74e07f

nschneid avatar Feb 01 '22 01:02 nschneid

This is a tricky one... We don't manually tag upos, and for NNPs that are VERB (or ADJ) we rely on some obvious deprels in the conversion (e.g. NNP+amod -> VERB if .*(ed|ing) and lemma!=form, else ADJ)

The cases this query identifies are heads, so it's not easy to find all of them. I agree the ones that have obj must be verbs (and same for .*:pass and a few other things), but even that won't tell us the right morphology. Consider some hypothetical movies called:

  • I Fooled/NNP Destiny
  • I have Fooled/NNP Destiny
  • Fooled

For 1-2 we can tell they are verbs because they have obj; but I can't tell whether they are VerbForm=Part or Fin. In the last case I'm not sure I can tell anything - is it a verb? Finite? An adjective? I could write some rules to catch maybe 70% of cases, but I'm not sure if that's actually better than leaving it (at least then it's consistent)

amir-zeldes avatar Feb 03 '22 16:02 amir-zeldes

Not consistent with other English corpora, though. I would err on the side of tagging it VERB if it can be interpreted as a verb.

nschneid avatar Feb 03 '22 18:02 nschneid

Right, I'm not saying I want it to be inconsistent with other corpora, I'm just saying I have no means of doing it automatically in a reliable way, and currently not enough resources to manually go over all NNPs in the corpus. I will leave this issue open in case we have someone who can do it in the future. I'll add the help-wanted tag, but anyone thinking of helping with this should talk with me first, since it would need to be done pre-conllu in _build/src/

amir-zeldes avatar Feb 03 '22 18:02 amir-zeldes

Note for whoever takes this on: WordNet may be helpful here.

nschneid avatar Feb 07 '22 03:02 nschneid