chroma Persistent DB

Data persistence behavior does not seem aligned with what folks may expect given other DBs.

For example, if I make a MongoDB data/db folder for development, I can connect and use that path to load the same database information.

However using Jupyter Notebooks this does not seem to be the case with Chroma, where new DBs are created every start and they are hashed, etc. making it difficult to interpret their purpose on the filesystem.

Is there a philosophy that needs to be documented or is this a bug? Am I missing something? Thank you.

Apr 11 '23 14:04 mxchinegod

Data persistence is an option with Chroma, but it's not the default option. It even draws attention to this in the logs: Using embedded DuckDB without persistence: data will be transient.

Are you suggesting that persistence should become the default option and transient should be opt-in? Or that the getting started tutorials should highlight this more clearly? Or... ?

Apr 11 '23 19:04 PaulMest

When restarting the kernel in jupyter, the behavior was that a new collection was being made with the same name, and never alerting that it already existed though persistence was enabled.

Apr 11 '23 23:04 mxchinegod

@DylanAlloy can you share any code to reproduce this? One issue with notebooks is their GC is not always reliable - and it's a good idea to manually call client.persist(). (normally GC will run del client which will call .persist())

Apr 12 '23 18:04 jeffchuber

We changed away from del to atExit which will help here. Closing this for now, but happy to re-open.

May 08 '23 17:05 jeffchuber

chroma chroma copied to clipboard

Persistent DB

chroma
chroma copied to clipboard