Open-Assistant
Open-Assistant copied to clipboard
Open Assistant is particularily bad at extracting info from contexts -> We need more context examples <-
@yk I have performed quite a few Tests with The Open Assistant and while it performs extremely well on casual conversation and in a bunch of scenarios I have noted that it is particularly bad at extracting info from contexts
. Also I noted that there are very few prompts in the Dataset where users dump in a chunk of information and let the AI extract information, which often leads to the assistant ignoring true information and instead it starts hallucinating, making something up, while the true answer is right in the prompt/context.
Why is that important?
I know that its incredibly important to build an open ChatGPT alternative that aims at general capabilities etc. But I think in the future the biggest Value for open Systems like Open Assistant will be that they can be used to extract true information from databases/dumps/context/tools... and truthfully answer users based on the documents.
How could we make sure to include more context extraction prompts
to the Open Assistant dataset?
I have labeled a bunch of messages and added assistant answers, however there are way too many prompts in the pipeline already so I cannot add these context-example prompts properly....
What sampling settings have you found to most influence information extraction? Did it get at all better with reduced temperature?
Please show your sampling settings, and how and where you set them - seems like some may have a relationship to your perceived quality of recall of the inference?
I have tried to change the different values and the temperature
as expected helps the extraction the most, something like 0,1-0,2 works best with the Lama-30b model. The lama model can handle English and Spanish relatively okay (however there are still hallucinations from time to time) Russian is worse but sill in a somewhat acceptable spot but it really starts hallucinating a lot in other languages I tested it in, i.e. German, French...
The pythia-12b-epoch-3.5 model obviously is generally worse (as expected with fewer tokens seen) and really struggles with all languages and extraction tasks, but (just as with the lama30b) the languages with fewer fine-tuning data are way worse than english or spanish.
Nevertheless I think the model is on a good way and imo it already performs better than RKWV based models and Llama Models that are based on ShareGPT or GPT4All datasets
(at least in english & spanish) but I think that there is still a lot of room concerning context extraction tasks, since there are still quite some issues...
I think it might be very useful to add in a bunch of prompts with context and especially guidelines for those creating the initial prompts to also add in context from time to time
...
Best Parameters & Model for Context Extraction (that I found):
Repetition penalty 1.2 <- increasing it too much usually makes it worse Temperature 0.2 <- It's all about Temperature Top K 50 Top P 0.95 <- lowering it too much usually makes it worse Typical P 0.5
I have notice the formatting has a crucial impact on the response when you are trying to include the context. Which token formatting are you using to construct your Prompt with Context @Logophoman?
I think it might be very useful to add in a bunch of prompts with context and especially guidelines for those creating the initial prompts to also add in context from time to time...
I second this! I cannot find any official example for incorporating context. I found this page, but again, most don't work, some don't belong to the 12b model, and the ones that work if you miss even 1 new line it goes wild! Some best practices would be great.
https://github.com/LAION-AI/Open-Assistant/blob/main/model/MESSAGE_AND_TOKEN_FORMAT.md