transform-and-tell
transform-and-tell copied to clipboard
Question about the "rare proper nouns" in the paper.
Hi Alasdair,
I'm curious about the "rare proper nouns" mentioned in your CVPR paper, which are described as ".. nouns that appear in a test caption but not in any training caption. " And I was wondering if I could ask some questions:
- Are the "rare proper nouns" proper nouns extracted by a certain toolkit like named entities (using spacy). If so, how do the "rare proper nouns" extracted?
- Is there any difference between the "rare proper nouns" and "named entities" except that the former is "rare"?
- The "rare proper nouns" do not appear in any training caption, but are they possible to exist in training or testing news articles?
Thanks very much!
how do the "rare proper nouns" extracted?
We first use spacy to extract proper nouns. You can see the actual get_proper_nouns
function we use here. Then, we define "rare proper noun" to be proper nouns that appear in a test caption but not in any training caption (note that we only look at captions and not actual article content).
Is there any difference between the "rare proper nouns" and "named entities" except that the former is "rare"?
You can check out our get_entities
function here. We use the NER from spacy. I believe that proper nouns and named entities are similar but not completely overlapping concepts. For example, $1 billion
is a named entity (MONEY
) but not a proper noun.
The "rare proper nouns" do not appear in any training caption, but are they possible to exist in training or testing news articles?
Yes. Since we only process the captions to do the classification, it is possible that a rare proper noun is not present in any training caption but might have appeared inside a training article context.