[Bug]: Adding records with same id accidentally leads to log messages on every subsequent operation
What happened?
I was inserting data chunks through my custom code, in which I update specific parameters like start ID, metadata, etc every time I add new data. In one instance I forgot to change the start ID which resulted in adding the data with the existing ID in the chromadb persistent client.
Now, I know that data is not updated, and I just need to re-add that data with the new correct start ID. But with the new operations, the same warning message still pops up.
Add of existing embedding ID: 1241
Add of existing embedding ID: 1242
Add of existing embedding ID: 1243
Add of existing embedding ID: 1245
Add of existing embedding ID: 1244
Add of existing embedding ID: 1245
Insert of existing embedding ID: 1241
Insert of existing embedding ID: 1242
Insert of existing embedding ID: 1243
Insert of existing embedding ID: 1244
Insert of existing embedding ID: 1245
I searched for a solution and figured out that Chromadb stores all operations in Chroma.sqlite under embedding_queue.
Now, to solve that log warning, I manually deleted the entries with the same id operation I executed through my code. It worked flawlessly and solved the issue. I believe this is a bug or issue.
But I want to know, does manually changing these entries in embedding_queue will affect my db. I'm careful will doing these deletions for the entries.
My version of chromadb: 0.6.3 using DB browser for sqilite application to edit chroma.sqlite Using persistent instance my os and version: macOS Sonoma 14.4.1
Versions
Chromadb: 0.6.3, python: 3.9.21, macOS: sonoma 14.4.1
Relevant log output
@vikas2131, I think this is duplicate of #4076. Can you take a look at the discussion here and let me know if this explains how Chroma works - https://github.com/chroma-core/chroma/issues/4076#issuecomment-2763367067
May I suggest you try upsert() so that 1) you don't see these warnings, 2) your end result overrides the existing doc if it exists, or creates a new one if it doesn't, 3) you don't have to fiddle around with Chroma internals to get this to work.
@vikas2131 did you get to the bottom of this? was @tazarov's comment helpful here?
closing now since it was addressed here https://github.com/chroma-core/chroma/issues/4076