gpt4all icon indicating copy to clipboard operation
gpt4all copied to clipboard

[Feature] Models to inform you it does not know something or has incomplete data

Open Daza99 opened this issue 10 months ago • 1 comments

Feature Request

A common issue with LLM is asking a 'general' question about a topic that a model is not trained on or has no or little information, often it will either fill gaps creatively when coming up with an answer or state something that is plainly false. It would be great if there was a stop-gap measure where if the model has incomplete data or no data: the response will say (w/red text or cyan)"I do not know" if it completely lacks data on the question. Or only part of the data "I have incomplete data but what I do have on the topic is x, however you could try getting more information from these sources". Also if the question by the user is too broad, to ask user to redefine or narrow the question, this could save time computing wise too.

This feature could give better credibility to a model(s) if it wont switch to creative mode when it doesn't know something or has incomplete data and just states it does not know or warns that its answer might be incorrect. At least then we can seek/check other sources for an answer.

I have a feeling this suggestion might be, if it was easy to implement it would be already a feature. But I thought it is worth suggesting it, because this is one major issue I have when seeking answers to questions, not knowing how accurate the answer is. Maybe this is not possible because of the sheer scope of questions and topics.

Daza99 avatar Apr 29 '24 22:04 Daza99

That's not really possible, or at least no one has come up with a good way to do that, yet.

They don't know what they don't know. And they'll happily make things up regardless (which is called hallicinating here).

What you can do, however, is augment the conversation with your own, external data. See LocalDocs.

cosmic-snow avatar May 01 '24 18:05 cosmic-snow

One could compare, if the words in the user prompt match words in the LocalDocs. Basically a duplicate check. But that alone will not be a sufficient solution, as the clear advantage and technological breakthrough of LLMs is to also include "nearest neighbour" or data based on its "similarity" into the embeddings. Even then, LLMs might ignore the embeddings, depending on how and on what training data it was trained and finetuned on. Pick a different model and it will behave completely different.

Closing as this feature is not trivial to implement.

ThiloteE avatar Jun 13 '24 14:06 ThiloteE