compromise icon indicating copy to clipboard operation
compromise copied to clipboard

#NickName is not being tagged and parsing bug

Open bugs181 opened this issue 7 years ago • 2 comments

Compromise version: 11.12.3 Node version: v11.0.0

According to line 6928

 // Dwayne 'the rock' Johnson
ts.match('#FirstName [#Determiner? #Noun] #LastName').tag('#NickName', 'first-noun-last').tag('#Person', 'first-noun-last');

This should tag "The Rock" as a NickName.

Code:

const doc = nlp('Dwayne “The Rock” Johnson')
doc
 .normalize({ honorifics: true })
 .people()
 .debug()

Output:

 ====
    --
    'Dwayne'                   - TitleCase, MaleName, FirstName, Person, Singular, Noun, ProperNoun
    'The'                      - TitleCase, Noun, StartQuotation, Quotation, LastName, Person, Singular, ProperNoun
    'Rock'                     - TitleCase, Noun, EndQuotation, Quotation, Person, Singular, FirstName, ProperNoun
    'Johnson'                  - TitleCase, LastName, Person, Singular, Noun, ProperNoun


The current workaround is to create a plugin:

function knownWords() {
  return {
    words: {
      'The Rock': 'NickName',
    },
  }
}

Output:

    ====
       --
       'Dwayne'                   - TitleCase, MaleName, FirstName, Person, Singular, Noun, ProperNoun
       'The'                      - TitleCase, NickName, Noun, StartQuotation, Quotation, LastName, Person, Singular, ProperNoun
       'Rock'                     - TitleCase, NickName, Noun, EndQuotation, Quotation, Person, Singular, FirstName, ProperNoun
       'Johnson'                  - TitleCase, LastName, Person, Singular, Noun, ProperNoun



In addition to that, the name parser goes bonkers on this type of data.

Example:

.people()
.forEach( person => {
  console.log(person.data())
})

Output:

    [ { text: 'Dwayne "The Rock" Johnson',
        normal: 'dwayne the rock johnson',
        firstName: 'dwayne rock',
        middleName: '',
        nickName: '',
        lastName: 'the johnson',
        genderGuess: 'Male',
        pronoun: 'he',
        honorifics: [] } ]

bugs181 avatar Dec 03 '18 13:12 bugs181

isn't it "rock 'the dwayne' johnson"? ;)

yeah, thanks. looks like the quoted part is getting tagged as as 'LastName', which is making it confused.

I'll take a look. cheers

spencermountain avatar Dec 03 '18 17:12 spencermountain

A potential fix could be to maybe check if it has #StartQuotation or #EndQuotation tags and remove the quotes? I was thinking the .normalize function would do this but I'm not really sure what that does.

bugs181 avatar Dec 03 '18 19:12 bugs181