compromise icon indicating copy to clipboard operation
compromise copied to clipboard

Lexicon terms beginning with # do not get matched

Open mattjennings opened this issue 3 years ago • 2 comments

I have a custom lexicon that links hashtags to tags. This worked in an older version of compromise (v11.14) but does not seem to be working in 13.11.4.

// I would expect this to match, but it does not
nlp("#GoJetsGo", { "#GoJetsGo": "SportsTeam" }).match("#SportsTeam").text() // ''

// this can work, however I would like to make sure only the hashtag is matched
nlp("#GoJetsGo GoJetsGo", { "GoJetsGo": "SportsTeam" }).match("#SportsTeam").text() // '#GoJetsGo GoJetsGo'

// if # is not the leading character it does work, so seems to only happen when it's leading
nlp("Go#JetsGo", { "Go#JetsGo": "SportsTeam" }).match("#SportsTeam").text() // '#Go#JetsGo'

This seems like it may be intentional (perhaps the built-in hashTag logic is conflicting?), but I'm having trouble finding anything in the docs that would say so.

edit: this also appears to happen for terms beginning with @. i.e, associating "@NHLJets": "#SportsTeam" will not work either.

mattjennings avatar Dec 13 '21 01:12 mattjennings

hey @mattjennings thanks - this is a good issue. You're right, something is bad. It's started tripping on the TitleCase bit, after the pound symbol. This is bad. I removed an 'i' from a regex a few versions back, and didn't have a test for it.

Sorry! here's what i'd do, right now: https://runkit.com/spencermountain/61b7bb4bd4140000092a6925

I will add a proper fix to v14, which will ship in January. I have been thinking about cleaning this stuff up, the timing is good.

Will keep this open, until then. cheers (sorry bout the jets this year)

spencermountain avatar Dec 13 '21 21:12 spencermountain

No problem, I can wait until v14. I appreciate the quick response and the great work you've been doing!

mattjennings avatar Dec 13 '21 23:12 mattjennings