gum icon indicating copy to clipboard operation
gum copied to clipboard

Ordinal Superlative construction ("[3rd tallest] building")

Open nschneid opened this issue 2 years ago • 10 comments

http://match.grew.fr/?corpus=UD_English-GUM@dev&custom=6221941141b83&clustering=X.upos reveals inconsistent treatment of both UPOS and deprels.

The ADJ guidelines specify that the ordinal in these cases should be tagged as ADJ despite modifying another adjective (presumably as advmod).

nschneid avatar Mar 04 '22 04:03 nschneid

I can make these cases consistent, but it looks like we both agree the correct deprel is advmod, which leads to the question of why tag them as ADJ and not ADV? If something is "third biggest", then that describes in "in what way is it big" or "how big is it?", which for me means it's an adverb (interrogable by "how", so manner or extent in this case).

Can we move to amend this guideline? I'm happy to make them all ADV + advmod.

amir-zeldes avatar Mar 04 '22 16:03 amir-zeldes

It's a case of productive extension—any ordinal number can be used in this construction; does that make it zero-derivation of ADV? I don't necessarily have a strong opinion but before changing the guideline we would need to hear why it was written that way.

nschneid avatar Mar 04 '22 17:03 nschneid

Oh also this construction doesn't just modify adjectives:

  • Sam has [the third most apples]

So this would be amod(apples, third)?

nschneid avatar Mar 04 '22 17:03 nschneid

Ever since the UD validator has put such an emphasis on equating advmod with ADV, it's been my understanding that adverbially used (morphological) adjectives should also be tagged ADV. This seems especially straightforward for English, since many morphologically unmarked items are regularly ADVs ("do something quick/ADV"), so I don't see the motivation for ADJ here in particular (it's not like we don't assume zero derivation for things like doing something quick, nice, fast etc.)

For "third most apples" I would have done advmod(most, third). I think the amod reading off the noun would mean something like there being three "most apples" instances, of which Sam is the third. So something like "Sam has (won the) third (iteration of the) most apples (award)". If it limits the scope of it being "most" (not absolutely most but third most), then it should be a child of "most".

Either way I'm curious what @dan-zeman and others think about this.

amir-zeldes avatar Mar 04 '22 18:03 amir-zeldes

The validator will not complain if it encounters an ADJ attached as advmod. The validator mainly wants to avoid NOUN+advmod (because nouns should be obl instead), and VERB+advmod (because those should be advcl instead).

If I understand correctly what the construction is supposed to mean, then I think that third should be attached to most and not to apples. Then advmod is probably more expected than amod, although I don't feel strongly about it. But I wouldn't change the tag of third from ADJ to ADV just because it occurs in such a construction.

dan-zeman avatar Mar 04 '22 19:03 dan-zeman

I wouldn't change the tag of third from ADJ to ADV just because it occurs in such a construction.

We're agreed on the attachment, but this part surprised me - if functioning as an adverb (advmod) is separate from being morphologically an adverb (ADV), then why not accept NOUN+advmod too? The reason we don't attach these as just obl is that they are unmediated (look like objects in "I ran three hours"), so as a compromise we have subtypes like :npmod, :tmod etc., inherited from Stanford Dependencies. But if being adverbial is just a function, we could have tagged them as advmod with non-ADV pos as well, so this seems inconsistent.

Would you also tag the following as adjectives?

  • It's better to do it right/?? rather than cheap/??
  • We should go home first/??
  • Prices were much/?? higher
  • Think long/?? and hard/?? about it

amir-zeldes avatar Mar 08 '22 19:03 amir-zeldes

I wouldn't change the tag of third from ADJ to ADV just because it occurs in such a construction.

We're agreed on the attachment, but this part surprised me - if functioning as an adverb (advmod) is separate from being morphologically an adverb (ADV), then why not accept NOUN+advmod too? ... But if being adverbial is just a function, ...

Because nominals and modifier words are different categories in the top-level UD taxonomy. Adjectives and adverbs are both modifier words, so I see at least some room for debate. But nouns are nominals, hence no advmod is allowed for them. Think of obl as the label for "being adverbial" that is used with nominals.

Would you also tag the following as adjectives?

Maybe... or maybe not. It depends on how you want to define adverbs in English. That has been a mystery to me ever since I learned that the -ly suffix is not obligatory.

dan-zeman avatar Mar 09 '22 10:03 dan-zeman

I see no need to reinvent the wheel on English ADJ vs. ADV. If a word like "cheap" or "long" could be replaced by "carefully" but not "careful", it should be ADV.

Regarding ordinal numbers, PTB says always ADJ, so it seems easiest to stick with that:

image

This construction is special, which is why they needed to mention it (and we should document it), but I think advmod(largest/ADJ, fourth/ADJ) is an acceptable option.

nschneid avatar Mar 09 '22 13:03 nschneid

it should be ADV.

+1 !

Regarding ordinal numbers, PTB says always ADJ, so it seems easiest to stick with that

The first part of that image is curious and not in line with the data (see below), but I think you're misreading the second guideline: it says "compounds of the form fourth-largest", but you need to keep in mind that these were not tokenized apart in the original PTB, so they are just saying the whole thing (headed by "largest") is an adjective. If you look at OntoNotes, which contains the re-tokenized PTB and which I take to be the successor of PTB, you will see that a majority of cases tags the modifier as RB (admittedly it's 26:17, so not a huge majority), including in WSJ:

  • They accounted for a hefty 16 % of New York Stock Exchange volume Monday , the fourth/RB busiest session ever (wsj_1863)
  • 700 billion yen -LRB- $ 4.93 billion -RRB- to 1.05 trillion yen -- the second/RB largest amount this year (wsj_1187 )
  • the $ 67 billion measure is the second/RB largest of the annual domestic spending (wsj_2044)

And similarly in the newer genres added by ON:

  • Japan is the second/RB largest economy in the world (cctv_0002)
  • the center of the second/RB largest city in Iraq , Basra (cnn_0267)
  • Costco , the third/RB largest superstore retailer in the United States (ectb_1043)

I think the "substitution by -ly" test suggests that things like sentence initial ordinals ("First, ..." = "Firstly", "Second" = "Secondly") should be tagged as ADV as well, and again ON backs this up:

  • First/RB , according to the Caijing report , Luneng 's privatization process lasted 11 years (c2e_0023)
  • Second/RB , a long - term shareholder of a good company need n't worry too much (wsj_1986)

A query for "First ," and "Second ," shows the skew here is much stronger, with 91:13 in favor of RB (plus 5 cases of LS, oddly, even though it's spelled out as a word!). I don't think ordinals should be given a unique analysis when they fit the same normal ADV distribution tests as regular adverbs, and though I might have agreed if there was a huge precedent for doing this for consistency reasons, it seems ON doesn't do it either.

amir-zeldes avatar Mar 09 '22 14:03 amir-zeldes

Re: ADJ vs. ADV generally, I was pointed to this paper which points out, for example, that adverbs can be postmodifiers of nouns ("his announcement recently that he would resign"). That and other constructions (adjectival compounds, etc.) are used to argue that the distinction cannot be made purely based on what is being modified.

In UD terms, "his announcement recently" is especially awkward because the adverbial vs. adnominal distinction is baked into the deprel, not just the POS.

nschneid avatar Mar 10 '22 04:03 nschneid