Naomi icon indicating copy to clipboard operation
Naomi copied to clipboard

Standard handler for Numbers

Open aaronchantrill opened this issue 4 years ago • 2 comments

Detailed Description

We would like to provide a special keyword for number, as opposed to the plugin-author-defined keywords like {ColorKeyword} or {DayKeyword} because recognizing and parsing a number is both more complex and extremely common. I propose either using square brackets ("[NUMBER]") or colons ("{:NUMBER:}") to distinguish system keywords from plugin keywords. Eventually I would like to have system keywords for Number, Date, and Time and I'm sure others will arise as we work on them.

Context

Right now, it would be difficult for an author to simply ask for a number in the template. For instance "WHAT IS {NumberKeyword} PLUS {NumberKeyword}" would require listing every possible number in the template itself: NumberKeyword: [ONE, TWO, THREE, ...] which would be incredibly time consuming, and when parsed into an expanded form would make the template take up as many lines as you added. In addition, there are numerous ways to say each number, so one person might say 'ONE NINE SIX FIVE' another might say 'ONE THOUSAND NINE HUNDRED SIXTY FIVE' another 'NINETEEN SIXTY FIVE' etc. This quickly becomes overwhelming.

Possible Implementation

There are rules for how numbers can be constructed. You might say, for instance "ONE HUNDRED THOUSAND" but you wouldn't say "THOUSAND". Since most language dictionaries are based on trigrams, I should be able to generate a set of trigrams for speaking numbers (ONE, ONE HUNDRED, ONE HUNDRED THOUSAND, SEVENTEEN OH ONE, SEVENTEEN THOUSAND AND, etc) and then insert only a list of words that may appear first or last in a number into the basic template. This should allow the language model to insert the full trigram model in its place.

aaronchantrill avatar Apr 04 '21 16:04 aaronchantrill

I've been working on this somewhat at Numbers.

The following are basic number words: 'ZERO', 'ONE', 'TWO', 'THREE', 'FOUR', 'FIVE', 'SIX', 'SEVEN', 'EIGHT', 'NINE', 'TEN', 'ELEVEN', 'TWELVE', 'THIRTEEN', 'FOURTEEN', 'FIFTEEN', 'SIXTEEN', 'SEVENTEEN', 'EIGHTEEN', 'NINETEEN', 'TWENTY', 'THIRTY', 'FORTY', 'FIFTY', 'SIXTY', 'SEVENTY', 'EIGHTY', 'NINETY', 'HUNDRED', 'THOUSAND', 'MILLION', 'BILLION', 'QUADRILLION'

'A' can be considered a number word if directly followed by a number (other than ONE): 'A HUNDRED'

'OH' can be considered a number word if it occurs next to another number NINE OH TWO ONE OH

'AND' can be considered a number word if it is both preceded and followed by a number word TWO AND TWENTY TWO AND A HALF

I think what we need to do is create a special placemarker for numbers, then scan the transcription for number words and replace them with the placemarker before passing the transcription to the intent parser. After the intent parser does its work, the numbers are placed back into a NUMBERS match group which lists all the numbers.

I'm not sure how to handle number associations. For example: Count to 100 from 1 -- ONE, TWO, THREE, ... , ONE HUNDRED Count from 100 to 1 -- ONE HUNDRED, NINETY NINE, NINETY EIGHT, ... ONE Count from 1 to 100 -- ONE, TWO, THREE, ... , ONE HUNDRED Count to 1 from 100 -- ONE HUNDRED, NINETY NINE, NINETY EIGHT, ... ONE

In this case the order of the numbers and the order of the prepositions both matter and can change the meaning of the command. Unfortunately, I think the plugin author would have to analyze the exact phrase to determine the beginning and ending points of the count.

aaronchantrill avatar Sep 20 '21 02:09 aaronchantrill

I'm planning to do slot types now. If you want a numeric value, you would create a template like "Count from {from:number} to {to:number}" When we pass the templates to the intent parser, we will replace the {from:number} with just , so the intent parser will see the template "Count from to ".

Then there will be a pre-parser that will look for any groups of number words using regex and put them into the matches for the variant, replacing the original locations with , so we will be passing "Count from to " to the intent parser, which should match the correct template.

Once the template is matched, then the identities of the numbers ("from" and "to") will be looked up and matched to the numbers.

It will be the responsibility of the template author to check that the numbers returned are in the correct range.

aaronchantrill avatar Nov 06 '22 13:11 aaronchantrill