snips-nlu
snips-nlu copied to clipboard
language-independent slot values
It would be great to be able to define slots with a language-independent value. For instance, we could have an intent callContact, to make a phone call, with a contactName slot and a phoneType slot which would specify on which of possibly several phones the contact should be called. Slot value LANDLINE would map to the different ways to name a landline for different supported languages.
Hi @syl22-00 , You can use synonyms and map different values in various languages to the same reference value (LANDLINE in your case). In your example, that could translate like this in english:
# phoneType entity in English
---
type: entity
name: phoneType
values:
- [LANDLINE, home, landline]
- [MOBILE, mobile, cellphone]
And in french:
# phoneType entity in French
---
type: entity
name: phoneType
values:
- [LANDLINE, maison, domicile, fixe]
- [MOBILE, mobile, portable, téléphone portable]
Would that be a suitable solution ?
My understanding with that solution is that the reference value is part of the list of accepted words, so it could create confusions if it is also a valid word in a non-English language, possibly with a very different meaning (such as ENTREE
which means a main course in American English and a starter in French, if you want an entity to catch a dish type).
The right way could be to use some kind of namespace in those reference values to make sure they are used as reference value only, and never found as is in the input: PHONETYPE-LANDLINE
, PHONETYPE-MOBILE
, DISHTYPE-ENTREE
.
Does that sound correct?
Many thanks for helping
Indeed, the reference value will be considered as a valid entity value. The namespacing solution that you mentioned is a valid workaround.
There is a feature that we discussed recently which would address properly this use case. It would be to have an optional value_id
attached to each group of synonyms. This new attribute would only be used in the final resolution step of the NLU, and would not be considered as a valid entity value, hence avoiding any confusion.
In the dataset, that could look like this:
{
"phoneType": {
"data": [
{
"value": "landline",
"value_id": "LANDLINE",
"synonyms": [
"home"
]
},
{
"value": "mobile",
"value_id": "MOBILE",
"synonyms": [
"cellphone"
]
}
],
"use_synonyms": true,
"automatically_extensible": false,
"matching_strictness": 1.0
}
}
And in the parsing result:
{
"slots": [
{
"range": {
"start": 22,
"end": 26
},
"rawValue": "home",
"value_id": "LANDLINE",
"value": {
"kind": "Custom",
"value": "landline"
},
"entity": "phoneType",
"slotName": "phoneType"
}
]
}
This is still at the discussion and specification stage though.
That would be great.
Thanks a lot for the support.
does value_id is implemented yet ?