nlp.js
nlp.js copied to clipboard
Isolate specific entity
I would like to know, how can I isolate a specific entity. I don't know if it's a bug but I would like to isolate an entity in my intent with this pattern :
'%BOOK% %PAGE_START% %PARAGRAPH_START%'
in the result I have PAGE_START in double and PARAGRAPH_START in double :
... "intent": "[BOOK] search_paragraph", "domain": "default", "score": 0.9987136557407928, "entities": [ { "start": 0, "end": 2, "len": 3, "levenshtein": 0, "accuracy": 1, "option": "DAILY_PLANET", "sourceText": "Daily", "entity": "BOOK", "utteranceText": "dai" }, { "start": 4, "end": 4, "len": 1, "levenshtein": 0, "accuracy": 1, "option": "1", "sourceText": "2", "entity": "PAGE_START", "utteranceText": "2" }, { "start": 6, "end": 6, "len": 1, "levenshtein": 0, "accuracy": 1, "option": "1", "sourceText": "3", "entity": "PAGE_START", "utteranceText": "3" }, { "start": 4, "end": 4, "len": 1, "levenshtein": 0, "accuracy": 1, "option": "1", "sourceText": "2", "entity": "PARAGRAPH_START", "utteranceText": "2" }, { "start": 6, "end": 6, "len": 1, "levenshtein": 0, "accuracy": 1, "option": "1", "sourceText": "3", "entity": "PARAGRAPH_START", "utteranceText": "3" } ], ...
I would like to have only 3 entities in the response (and not the double PAGE_START and PARAGRAPH_START) :
- BOOK (value: 'DAILY_PLANET'...)
- PAGE_START (value: 1, 2, 3, 4...)
- PARAGRAPH_START (value: 1, 2, 3, 4...)
How can I have that please ? It's a bug ?
Hi Jérémie,
I have a related problem with builtin ER.
In your case, how do you disambiguate between a page number and a paragraph number?
Maybe regex like: p[. ](\d+) and §(\d+) to avoid numbers to be recognized both as page and paragraph? It could work if you explain to users that § stands for paragraph sign in french (SHIFT+! on AZERTY keyboards).
Can you share your lines or XLS?
In v4 there should be a different response already, but still not what you need ... I prepare a PR for it - also in connection to #1174 ... but most likely PR nreeds to wait until my other 4 PRs are merged ... It starts to overlap code-wise, so else it gets a merge hell
Closing due to inactivity. Please, re-open if you think the topic is still alive.