tue_robocup
tue_robocup copied to clipboard
PicoVoice Rhino integration
We want to integrate PicoVoice Rhino for intent recognition.
Our current NLU (Natural Language Understanding) pipeline is:
- Ask a question
- Speech to Text (on robot)
- Parse text according to a grammar+target into 'semantics' (mapping of parameters to values)
- Take action based on the semantics (ask another question or go do something)
Out of these, PicoVoice Rhino handles:
- Speech to Text
- Parsing text to 'semantics': an 'intent' with some parameters (e.g. intent 'bringItem' with parameters/slot specifying what item to bring and from where to where to bring the item)
So we don't have to run speech recognition and also don't have to use the grammar parser. We still do the process of interpreting this information and acting on it of course.
There is a downside though: the API's we've developed around the NLU pipeline work with a grammar that specifies what sentences are acceptable and what words fill up what parameters. That is grammar is still there, on the PicoVoice console.
PicoVoice concepts
In PicoVoice, there are some concepts to know:
- Expression: An Intent can be expressed with different sentences and structures of sentences. Eg. 'Get me item A from the kitchen' or 'Go to the kitchen, get me A and bring it to me' both have the same meaning and intent, but a very different structure.
- These expressions are comparable to the grammar definitions the
grammar_parser
uses.
- These expressions are comparable to the grammar definitions the
- Intent: A way to interpret a user's command. eg.
bringItem
,makeCoffee
.- Comparable in function to the Target that the
grammar_parser
uses.
- Comparable in function to the Target that the
- Slot: an Intent can fill some slots. eg. what item to bring from where to where, what kind of coffee to make etc. These parametrize the command.
- Context: a collection of various Intents that have some commonality and relation to each other
- Roughly comparable to a overall grammar definition for the
grammar_parser
. - These are referred to via a
context_url
.
- Roughly comparable to a overall grammar definition for the
TODO
We'll somehow have to map the stuff we've used in conjunction with the grammar_parser
to PicoVoice stuff.
We can't send a grammar and expect that to be recognized. Instead, we have to create Intents (with expressions and slots etc), gather them into a Context and refer to those instead of sending a grammar.
Many of the grammars are not defined/hardcoded in the challenge state machines directly but import this from robocup_knowledge
which could save a bit of course.
- [x] Create HMI server for PicoVoice: https://github.com/tue-robotics/hmi_picovoice
- [x] Set up Amigo/TechUnited PicoVoice account? (@PetervDooren has the credentials)
- [ ] Define Contexts+Intents+Slots+Expressions for all our grammars.
- [x] yes/no intent
- [x] declareName
- [ ] ...
- [ ] Replace our use of grammars within the RoboCup challenges with context_urls and intents
Integration with Challenges
Because the grammar-based HmiQuery-API is still quite useful and used with eg. Telegram and other HMI servers, maybe it's better to create a 2nd API that reflects that PicoVoice (and other similar services) take care of a larger part of the NLU pipeline.
Both these APIs are useful at the same time. Ideally, we can use the hmi
-framework to query the user via Telegram and PicoVoice at the same time.
Since many of the grammars are already defined in robocup_knowledge
, maybe we can make the connection between grammar+target for grammar_parser
and the intent and context_url for PicoVoice?
We might even be able to generate the .yaml files that PicoVoice can import to define a Context
. That would allow to 'compile' a grammer for PicoVoice and thus have a single source of truth.