snips-nlu icon indicating copy to clipboard operation
snips-nlu copied to clipboard

trailing curly brace being stripped from slot value

Open Shotgun167 opened this issue 6 years ago • 3 comments

I have some values in slots that are surrounded by curly braces and are meant to be returned as is. Instead, the trailing brace is being stripped. "${website}" becomes "${website". I have training examples where the whole "${website}" is included. Is there a way to change this behavior?

Shotgun167 avatar May 01 '19 20:05 Shotgun167

@Shotgun167 , This is indeed a limitation due to the current tokenization which strips some punctuation. However, the "${website}" value should still be retrieved in the resolved value field:

{
  "input": "go to ${website}",
  "intent": {
    "intentName": "go_to_url",
    "probability": 1.0
  },
  "slots": [
    {
      "entity": "url",
      "range": {
        "end": 15,
        "start": 6
      },
      "rawValue": "${website",  # TRUNCATED VALUE HERE
      "slotName": "url",
      "value": {
        "kind": "Custom",
        "value": "${website}"  # FULL VALUE HERE
      }
    }
  ]
}

The plan (mid-term) is to have a tokenizer component which will be customized through the NLU configuration.

adrienball avatar May 03 '19 14:05 adrienball

I am working around it right now. I substitute in a crazy string for the trailing punctuating before parsing, and then swap it back out of the response. This is a nasty, ugly hack that makes code reviewers cry. So, I look forward to the customizable parser.

Is it possible to guesstimate a timeframe? And, yes, I do have time to offer help, though I will not swear that I have the relevant expertise.

Shotgun167 avatar May 03 '19 15:05 Shotgun167

It is not prioritized yet so I can't give you a good ETA, but I think this could be done within the next 3 months.

adrienball avatar May 03 '19 16:05 adrienball