GPTCache icon indicating copy to clipboard operation
GPTCache copied to clipboard

Chroma API change for 0.4.0 version

Open jeffchuber opened this issue 1 year ago • 15 comments

** This should land Monday the 17th **

Chroma is upgrading from 0.3.29 to 0.4.0. 0.4.0 is easier to build, more durable, faster, smaller, and more extensible. This comes with a few changes:

  1. A simplified and improved client setup. Instead of having to remember weird settings, users can just do EphemeralClient, PersistentClient or HttpClient (the underlying direct Client implementation is also still accessible)

  2. We migrated data stores away from duckdb and clickhouse. This changes the api for the PersistentClient that used to reference chroma_db_impl="duckdb+parquet". Now we simply set is_persistent=true. is_persistent is set for you to true if you use PersistentClient.

  3. Because we migrated away from duckdb and clickhouse - this also means that users need to migrate their data into the new layout and schema. Chroma is committed to providing extension notification and tooling around any schema and data migrations (for example - this PR!).

After upgrading to 0.4.0 - if users try to access their data that was stored in the previous regime, the system will throw an Exception and instruct them how to use the migration assistant to migrate their data. The migration assitant is a pip installable CLI: pip install chroma_migrate. And is runnable by calling chroma_migrate

Please reference the readme at chroma-core/chroma-migrate to see a full write-up of our philosophy on migrations as well as more details about this particular migration.

Please direct any users facing issues upgrading to our Discord channel called #get-help. We have also created a email listserv to notify developers directly in the future about breaking changes.

TODO

  • [x] Migrated any duckdb+parquet strings to the new format
  • [ ] Notified users about the breaking change (this PR, other suggestions?)

jeffchuber avatar Jul 17 '23 05:07 jeffchuber

Welcome @jeffchuber! It looks like this is your first PR to zilliztech/GPTCache 🎉

sre-ci-robot avatar Jul 17 '23 05:07 sre-ci-robot

please make the dev branch as the target branch

SimFG avatar Jul 18 '23 02:07 SimFG

@SimFG done!

jeffchuber avatar Jul 18 '23 02:07 jeffchuber

@jeffchuber If I use the Chroma 0.3.29 and run the latest code, there will be a error. right?

SimFG avatar Jul 18 '23 02:07 SimFG

@SimFG that is correct - this new API change only supports 0.4.0 and above.

jeffchuber avatar Jul 18 '23 02:07 jeffchuber

@jeffchuber please give a look for the failed unit test

SimFG avatar Jul 18 '23 14:07 SimFG

@SimFG looks like sqlite needs to be updated - https://github.com/chroma-core/chroma/issues/836

are you all open to making this change?

jeffchuber avatar Jul 18 '23 15:07 jeffchuber

@jeffchuber I have a idea. Is it possible to allow users to choose through parameters, that is to say, keep the previous code by default. If you want to use chrome 0.4.0, you can add additional parameters to use.

def __init__(
        self,
        client_settings=None,
        persist_directory=None,
        collection_name: str = "gptcache",
        top_k: int = 1,
        use_new_version: bool = False,
    ):
        self.top_k = top_k
        if client_settings:
            self._client_settings = client_settings
        else:
            self._client_settings = chromadb.config.Settings()
            if persist_directory is not None:
                if use_new_version:
                    self._client_settings = chromadb.config.Settings(
                        is_persistent=True, persist_directory=persist_directory
                    )
                else:
                    self._client_settings = chromadb.config.Settings(
                        chroma_db_impl="duckdb+parquet", persist_directory=persist_directory
                    )
        self._client = chromadb. Client(self._client_settings)
        self._persist_directory = persist_directory

This can minimize the impact on users. When users want to pursue a better experience, they can manually pass a parameter.

SimFG avatar Jul 19 '23 01:07 SimFG

@SimFG we could so something like this user proposed (and was merged) for langchain - https://github.com/hwchase17/langchain/pull/7891?

jeffchuber avatar Jul 19 '23 02:07 jeffchuber

@jeffchuber yes you can try to do it!

SimFG avatar Jul 19 '23 03:07 SimFG

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jeffchuber To complete the pull request process, please assign cxie after the PR has been reviewed. You can assign the PR to them by writing /assign @cxie in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

sre-ci-robot avatar Jul 20 '23 17:07 sre-ci-robot

@SimFG added backwards compatibility, can you retrigger the tests?

jeffchuber avatar Jul 20 '23 17:07 jeffchuber

@jeffchuber Now the error is that the sqlite version is too low. Look at the solution, if it is below python 3.10, you need to manually install a higher version of sqlite and replace it. I think this is very unfriendly to users.

SimFG avatar Jul 21 '23 07:07 SimFG

As far as I can tell - this is a different base OS issue. We use python:3.10-slim-bookworm to back our Docker images that run tests, I'm not sure if GPTCache uses python:3.8-slim-bullseye or ubuntu-20.04 or other?

jeffchuber avatar Jul 23 '23 14:07 jeffchuber

@jeffchuber You can solve this problem by merging the latest dev branch. If the user uses chromadb, the lower version 0.3.26 will be installed by default, because I need to ensure the availability of GPTCache. If the user wants to use the new features of a higher version of chromadb, I believe he should also understand this part of the incompatibility problem.

SimFG avatar Jul 24 '23 11:07 SimFG