compromise
compromise copied to clipboard
#NickName is not being tagged and parsing bug
Compromise version: 11.12.3
Node version: v11.0.0
According to line 6928
// Dwayne 'the rock' Johnson
ts.match('#FirstName [#Determiner? #Noun] #LastName').tag('#NickName', 'first-noun-last').tag('#Person', 'first-noun-last');
This should tag "The Rock" as a NickName.
Code:
const doc = nlp('Dwayne “The Rock” Johnson')
doc
.normalize({ honorifics: true })
.people()
.debug()
Output:
====
--
'Dwayne' - TitleCase, MaleName, FirstName, Person, Singular, Noun, ProperNoun
'The' - TitleCase, Noun, StartQuotation, Quotation, LastName, Person, Singular, ProperNoun
'Rock' - TitleCase, Noun, EndQuotation, Quotation, Person, Singular, FirstName, ProperNoun
'Johnson' - TitleCase, LastName, Person, Singular, Noun, ProperNoun
The current workaround is to create a plugin:
function knownWords() {
return {
words: {
'The Rock': 'NickName',
},
}
}
Output:
====
--
'Dwayne' - TitleCase, MaleName, FirstName, Person, Singular, Noun, ProperNoun
'The' - TitleCase, NickName, Noun, StartQuotation, Quotation, LastName, Person, Singular, ProperNoun
'Rock' - TitleCase, NickName, Noun, EndQuotation, Quotation, Person, Singular, FirstName, ProperNoun
'Johnson' - TitleCase, LastName, Person, Singular, Noun, ProperNoun
In addition to that, the name parser goes bonkers on this type of data.
Example:
.people()
.forEach( person => {
console.log(person.data())
})
Output:
[ { text: 'Dwayne "The Rock" Johnson',
normal: 'dwayne the rock johnson',
firstName: 'dwayne rock',
middleName: '',
nickName: '',
lastName: 'the johnson',
genderGuess: 'Male',
pronoun: 'he',
honorifics: [] } ]
isn't it "rock 'the dwayne' johnson"? ;)
yeah, thanks. looks like the quoted part is getting tagged as as 'LastName', which is making it confused.
I'll take a look. cheers
A potential fix could be to maybe check if it has #StartQuotation or #EndQuotation tags and remove the quotes? I was thinking the .normalize function would do this but I'm not really sure what that does.