gpt4all Models Fail to Process Large Embedded File with LocalDB Embedding

Models Fail to Process Large Embedded File with LocalDB Embedding

Open l1v0n1 opened this issue 8 months ago • 4 comments

trafficstars

Bug Report

I embedded a large file with localdb embedding in GPT4All. The file contains a conversation between two people and has over 90,000 lines of messages. After embedding, I tried several models in GPT4All, including:

deepseek-r1-distill-qwen-7b
llama 3 8b
reasoner v1
mistral instruct

All models failed to provide accurate results. However, the deepseek-r1-distill model attempted to process the data but often provided incorrect or incomplete answers. For example, when I asked, "What conversations did users have between 2024-10-01 and 2024-10-16?", the model either skipped the date range or skipped many messages and responded that there were no messages in this period. Similarly, when I asked, "What conversations did users have on 2024-09-26?", the model responded with:

Based on the provided context:
Answer:
There are no specific conversations recorded by users named Kamil and Lana on September 26, 2024. The earliest entry in the context is from September 27th onwards.
If you need further assistance or have more data for that date, please provide additional information!

This indicates that the model does not see a lot of chunks, which seems to be the core issue.

Steps to Reproduce

Embed a large file with over 90,000 lines of conversation using localdb embedding in GPT4All.
Try querying the data using various models, including deepseek-r1-distill-qwen-7b, llama 3 8b, reasoner v1, and mistral instruct.
Ask specific questions about conversations within a date range or on a specific date.

Expected Behavior

The model should accurately process and retrieve the conversations within the specified date range or on the specific date without skipping messages or providing incorrect information.

Your Environment

GPT4All version: 3.10.0
Operating System: Macos sequoia 15.3.1
Chat model used (if applicable): Deepseek-r1-Distill-Qwen-7B, Llama 3 8B, Reasoner v1, Mistral instruct]

Mar 09 '25 13:03 l1v0n1

gpt4all gpt4all copied to clipboard

Models Fail to Process Large Embedded File with LocalDB Embedding

Bug Report

Steps to Reproduce

Expected Behavior

Your Environment

gpt4all
gpt4all copied to clipboard