compromise
compromise copied to clipboard
Lexicon terms beginning with # do not get matched
I have a custom lexicon that links hashtags to tags. This worked in an older version of compromise (v11.14) but does not seem to be working in 13.11.4.
// I would expect this to match, but it does not
nlp("#GoJetsGo", { "#GoJetsGo": "SportsTeam" }).match("#SportsTeam").text() // ''
// this can work, however I would like to make sure only the hashtag is matched
nlp("#GoJetsGo GoJetsGo", { "GoJetsGo": "SportsTeam" }).match("#SportsTeam").text() // '#GoJetsGo GoJetsGo'
// if # is not the leading character it does work, so seems to only happen when it's leading
nlp("Go#JetsGo", { "Go#JetsGo": "SportsTeam" }).match("#SportsTeam").text() // '#Go#JetsGo'
This seems like it may be intentional (perhaps the built-in hashTag logic is conflicting?), but I'm having trouble finding anything in the docs that would say so.
edit: this also appears to happen for terms beginning with @
. i.e, associating "@NHLJets": "#SportsTeam"
will not work either.
hey @mattjennings thanks - this is a good issue. You're right, something is bad. It's started tripping on the TitleCase bit, after the pound symbol. This is bad. I removed an 'i' from a regex a few versions back, and didn't have a test for it.
Sorry! here's what i'd do, right now: https://runkit.com/spencermountain/61b7bb4bd4140000092a6925
I will add a proper fix to v14, which will ship in January. I have been thinking about cleaning this stuff up, the timing is good.
Will keep this open, until then. cheers (sorry bout the jets this year)
No problem, I can wait until v14. I appreciate the quick response and the great work you've been doing!