Naomi icon indicating copy to clipboard operation
Naomi copied to clipboard

[Feature-Request] - introducing NLP

Open TuxSeb opened this issue 6 years ago • 6 comments

NLP, standing for Natural Language Processing, would allow us to analyse the users speech structures more efficiently and make smarter things with OVIA

Currently, OVIA is very simple, you give her a list of word to recognize in your plugins and she will find them in the user speech, and process with them not more, using matching, you can pronounce the sentence which contains your order and it will launch an attached task.

An example with the current weather module, if your position is set to London,

User: "OVIA, whats the weather ?" or User: "OVIA, weather" or User: "OVIA, can you give me the weather please?"

OVIA will always have the same answer:

OVIA: "Today is a rainy day in London"

But what if i want the New-York weather ?

User: "OVIA, what's the weather in New-York?"

User: "OVIA, weather in New-York?"

User: "OVIA, New-York Weather"

Ovia will always have the same answer:

OVIA: "Today is a rainy day in London"

It does not work

Why OVIA is talking about the London's weather ? because it's hard-coded, she only recognize keywords in the sentence like "Weather", "temperature" using the matching methods ... and the setup city in profile.yml is London, Basically, when modules keywords are detected, it trigger the specific module and this module use the availlable informations in profile.yml, without going further, and then the TTS say the prepared sentence. its not "smart".

Using NLP, we could recognize and analyze intelligently each word functions, sentences structures, detect places in sentences, meanings, and even users feelings to adapt OVIA's behaviors and make her smarter, and maybe compare her to current proprietary solutions where privacy notions doesn't exist.

Issue in progress

TuxSeb avatar May 11 '18 06:05 TuxSeb

https://spacy.io/ might fit our fits for fast offline NLP

redragonx avatar Aug 19 '18 04:08 redragonx

a man previously created this pr https://github.com/NaomiProject/Naomi/pull/31 about adding Text To intent handling

From this we could add various NLP engines (NLTK, Spacy ...) as plugins , and be independent of any

TuxSeb avatar Aug 19 '18 20:08 TuxSeb

I've been looking a lot lately at how to best represent intents. The Naomi system of intent parsing is currently using two different standard functions in each speechhandler plugin. First, Naomi loops through all the speechhandler plugins, starting with the highest priority (most speechhandlers use the default priority of 0, which causes them to be queried in a somewhat random but consistent order -- probably alphabetically by directory name) and passes the text to the "is_valid" method. That method then can return "true" if it thinks it can handle the request or “false” if not. If the response is true, then the text is passed to the plugin's "handle" method for further processing, otherwise the next plugin is checked. There is no ability to weigh different confidences, it’s just a straight ask every plugin if they can handle the request, and the first one that says they can wins.

This is not very scalable, since it allows a plugin to dominate all others by pushing the priority really high and defining “is_valid” as simply:

def is_valid(text):
	return true

Basically, the problem is that the first plugin that decides it wants to try to respond to a request may not be the best equipped to do so. Plugin authors have no idea what other plugin authors are doing, so it is important to have an algorithm act as an arbitrator.

There are basically two different approaches to intent parsing, which are illustrated well by the different approaches taken by Adapt and Padatious. Adapt builds up requests atomically, by defining lists of words and combining them into intents consisting of required words (basically the words checked by Naomi’s “is_valid”) and optional words (they add weight to the confidence, but don’t clinch it). Padatious takes example sentences and works out the closest match by studying both word frequency and word order.

I have started writing a Naomi intent parser, which uses Padatious style templates but parses sentence structure down to keywords. So far, this seems to work well, although some of the confidences are alarmingly low. I’m sure it can use work, but it’s a start. Most importantly, though, this works by converting Padatious style templates into Adapt style frequencies. Thus, we can instruct developers to use Padatious style templates and keyword lists, and convert them into Adapt style intents in the Adapt plugin for users who want to use the Adapt Intent Parser. Translating these intents to the Padatious parser would simply require a little tweaking of the structure, so that should be really easy.

Standardizing the Text To Intent interface is, in my opinion, the last big hurdle before really pushing Naomi as the open source alternative. After that, things calm down a lot, and I plan to work mostly on building and improving Speech To Text training plugins.

If anyone wants to look at what I'm doing, I have created a repository at: https://github.com/aaronchantrill/intent_parser_tests

With regard to Naomi's own parsing, I have a lot of ideas about using phonemes instead of parsing all the way out to words (so intent parsing will act more like soundex searching, meaning that we no longer have to worry about figuring out whether the user said "weather" or "whether"). I'd also like to implement levenshtein distance comparisons for matching a word or phrase that was either misheard or mispronounced.

Any thoughts?

aaronchantrill avatar Sep 27 '19 03:09 aaronchantrill

One concrete issue that NLTK would be able to handle easily enough is standardizing contractions in english (converting "WHAT'S" to "WHAT IS") in both templates and queries.

Another place would be "stem"ing words in english - so the templates "DO I HAVE ANY EMAILS" and "CHECK MY EMAIL" would both be recognized to contain the word "EMAIL", or "WILL IT RAIN TODAY" and "IS IT RAINING IN PARIS" would both be recognized to contain the word "RAIN". This could greatly simplify the weather intent.

My understanding is that German does not use contractions. Stemming rules are different for different languages. So I'm thinking of setting it up so each language or locale can use a different function.

I'd also like to use NLTK in the Naomi_TTI plugin to weight words by frequency, so words that appear often in English will contribute less weight to the decision.

aaronchantrill avatar Mar 16 '20 04:03 aaronchantrill

I think this is kind of too big a concept for a single request. At this point, what we need is to start using tools like gensim to convert phrases into vectors, which should produce better accuracy than our current edit distance approach. We also could use natural language processing to make Naomi more aware of its environment and so it can start building a theory of mind about the people interacting with it. This is a huge and extremely interesting subject area with respect to artificial intelligence, and just seems overwhelming as a single feature request. Can we break this out into some more concrete goals?

aaronchantrill avatar Sep 25 '21 01:09 aaronchantrill

At this point, I am working on incorporating Naomi with llama style gguf generative engines. These will currently run on a raspberry pi 5 and allow a more natural conversation. There are still challenges, such as lack of punctuation in transcriptions, although I'm interested to see if whisper.ai adds punctuation. I'm also working on using pre-training, prompt engineering and RAG to get models to activate specific plugins when appropriate.

There is a lot that can be done with NLP, using various tradeoffs like speed vs accuracy, open ended conversations vs specific requests, etc.

aaronchantrill avatar Jan 20 '24 17:01 aaronchantrill