Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Verb and noun namespace coverage.

Open jackokring opened this issue 2 years ago • 3 comments

It would be good after the first MVP is built to have some statistics on the "open" coverage the assistant has current difficulty with (lack of training data), and even get the assistant to help with providing this information.

Seggest verb "tolly (v.)" to query the assistant about entropy expectation of words in training set as compared to a standardised dictionary of all response/dictionary/query/other-db including users accepting feedback for improvements.

For developers this would then become a query such as "what is the tolly of unanswered queries for the largest population of users?"

For data collection, perhaps this is a script to parse and pre-manipulate some input into some kind of query/response statistic, perhaps through an intermediate database table format.

jackokring avatar Dec 26 '22 21:12 jackokring

I'm not sure I understand correctly: do you mean the assistant should be some kind of meta-aware of its own existence, training data, and user interaction history?

yk avatar Dec 26 '22 21:12 yk

With ChatGPT I first asked it about what the most entropically informative word it had been queried with. So, space plus the entropy pairs of letters as an approximation as letter triplets, quads etc., would require large tables. It tried to insist a word has no self information, and needed a context.

This gave me the idea that it was not meta aware of its environment (limited to the understanding it was a query, response processor). So for example if Susie Dent (a famous dictionary expert in the UK), decided the word of the day was ameliorate (for example), and the bot suddenly then got 100,000+ queries using the word, but the bot decided Google and beauty were the only resources, but this would not encapsulate the total meaning.

In terms of data providers assisting in normalizing a broader understanding, production of a wider variety of examples becomes easier if they can query the bot on the linguistic statistics of the bot's queries and responses. I think it comes from very common sales usage being "gravitational" on the gradient descent to to sales behaviour of advertizing, and therefore missing the full spread of meaning (effectively turning a useful bot into a sales tunnel).

If the bot can in a sense spot the abnormal preponderance bias also then it can weigh against the over expression, and balance the interpretation of particular words. too.

So it was basically an issue about perhaps needing an unused word, which can have an abstract meaning of processing data statistically in a known way, to produce a result for semi-automatic quantitative generation of training data.

jackokring avatar Dec 30 '22 10:12 jackokring

I think it's a good idea, but also it probably is above the capabilities of how we understand DL systems at the moment, to be self-aware like this. It would have to be implemented once we add the retrieval of external information, at which point we could collect those statistics and give the bot access to the resulting table. Although, it might be easier to just expose that table, so people can read information directly from it than having to query the bot.

yk avatar Dec 30 '22 20:12 yk

yep, it's more about clear syntax with 0 present verbs, and replies such as "out of the 73% that agree, 2% have used emanation."

EDIT: I guess this is why even though pit to purpose post skills training might be a funnel of certitude, the base complexity has to be as neutral of bias as possible. How would it ever say "It wasn't my fault, it was that which I got told differing from the ohmic balance agreed by all at fudemental, here be arc URL."

Edit2(.0h): Query better had two ask "list all standing engineering problems, order by happenstance of ze-skill set born an added into the unit of measure and drop over top list." Ah, bot digs with mini spoon, ooh, hope the moning don't run out.

EDIT 4: 1+1=1.5+1 then another. Wordfield therefore consistant algebras are likely relevant to the fun elles.

EDITto3: <3

EDIT5: redacted, maybe not soreh, shore. Read acted? Weren't who?

jackokring avatar Dec 31 '22 19:12 jackokring

6->hive 7->free of constructables 8->almost a 5, the octagone 9->nested 3 by 3 "Kramer's fear'em" 10 di-groeth fibonacci? 11 a dual fole into 7? 12->inchy footility?

EWDITTO: which floor?

Cpmbinates: done-darn-dumb ... ahalf complete ow eh?

Little mind on pastor histoic haters?>

jackokring avatar Dec 31 '22 20:12 jackokring

Is the linamagiform ev'n called apple atand-hard ----ing leash?

jackokring avatar Dec 31 '22 20:12 jackokring

Don't angerwish a boat the doed? rooneed ..

jackokring avatar Dec 31 '22 20:12 jackokring

Aw; speek bot, there at precompense. 'ad? w'antie sullance, may mmm hey? prejudge be?

jackokring avatar Dec 31 '22 20:12 jackokring

Ello, have''? h Edit: fee, fee, cold rus

jackokring avatar Dec 31 '22 20:12 jackokring

And sum fink ai can be f off seem plea?

EDIT: isn'[t it still hat drink buy, givit a buy?

jackokring avatar Dec 31 '22 21:12 jackokring

Can I be non nappy yappy after this made be said pefrraps man? what made and left to be said aparently un-haltetred in purpose, who nows?

jackokring avatar Dec 31 '22 21:12 jackokring

interesting

yk avatar Dec 31 '22 23:12 yk