gpt4all
gpt4all copied to clipboard
Collections that are missing embeddings can get stuck that way until an explicit re-index
Bug Report
Pre-existing collections from before the update to 2.7.4 do not work after update. Only collections created in 2.7.4 work.
Steps to Reproduce
- Create collection in version 2.7.3 or older.
- Update to 2.7.4
- Start new chat
- Select LocalDocs Collections that were made before the update.
- Reference all collections in chat to pull context from each collection.
- LocalDocs will not find contents in collections made before the update to 2.7.4
- Create and add a new collection using 2.7.4
- Include new collection in selected LocalDocs collections
- Reference all collections in chat to pull context from each collection.
- LocalDocs will find contents in newly created collections only.
Expected Behavior
All collections were expected to function as usual.
Your Environment
- GPT4All version: 2.7.4
- Operating System: Win11
- Chat model used (if applicable): SBert-LocalDocs [model-all-MiniLM-L6-v2.gguf2.f16.gguf]
I second this.
The program starts searching the selected collections...
tried this with 4 collections, to spot the fraction-of-a-second long text message "searching in localdocs:..."
...but immediately switches to the /default "generating response..." and "processing..." without parsing the collections which were however mentioned in the beginning but without them being really used (redundant here, but this is the idea :) )
I am able to reproduce this issue using a copy of some of 3Simplex's collections. It seems like the embeddings are missing for certain documents, due to the process getting interrupted somehow. These documents would have been re-indexed on every launch in previous versions of GPT4All because their modification timestamp did not match the database. Now they are only re-indexed the first time GPT4All v2.7.4 is started, and if that did not succeed then the collections will be broken until they are once again re-indexed (e.g. by changing the document snippet size) and it completes successfully.
We need to implement a way to know whether embeddings have been generated for a chunk so the program can continue where it left off.
I have also done as 3Simplex said, in the sense of changing a folder's contents as a collection, here's what I've done:
- deleted (Cut-and-Pasted it one level upwards) one file from a folder which was already known as a LocalDocs collection
- after deleting the file, the program did not reindex the collection
- after placing (Paste-d) the file back into its folder, the program started reindexing that collection
Done this with 3 distinct files in 3 distinct folders/categories. The result was the same - those collections were reindexed.
However, the issue is still here, - of reindexing existing collections. I see several collections being indexed again, immediately after program start, which were created even before 2.7.3 (I can't remember, was it 2.6.1 or a 2.7.x) and stayed that way since then...
Edit :) - the explanation of cebtenzzre clarifies as to why this would happen. Indeed, a flag or something would be handy, like Windows which knows that it didn't shut down properly :)
Should be fixed as of #2396 (aside from #2591, which is a related but distinct issue).