chatgpt-retrieval-plugin
chatgpt-retrieval-plugin copied to clipboard
add supabase and postgres + pgvector datastore providers
What kind of change does this PR introduce?
- [x] add supabase as a datastore provider
- [x] add pure postgres as a datastore provider
- [x] add necessary migrations with
- [x] Postgres function to search closest vectors using pgvector extension
- [x] table to store embeddings
- [x] enable pgvector postgres extension
- [x] update readme
- [x] add local setup guide for supabase datastore provider and postgres datastore provider
- [x] added simple tests for both
Hey, OpenAI team 👋 thanks for this great service template and plugin!
Supabase is just one of the hosted PostgreSQL options that supports pgvector
. There are options listed here.
A user of this project could also choose to use their own PostgreSQL instance with pgvector
enabled.
I believe in a more abstract implementation and the README/docs can mention the hosted providers.
Supabase is just one of the hosted PostgreSQL options that supports
pgvector
. There are options listed here.A user of this project could also choose to use their own PostgreSQL instance with
pgvector
enabled.I believe in a more abstract implementation and the README/docs can mention the hosted providers.
One does not replace another. This extension uses postgREST to access database. Both datasources: just postgres and Supabase would be great additions to this project :)
@egor-romanov sorry that I overlooked it.
As I see, this implementation ends up using postgREST and a PostgreSQL connection (init_db
). Which requires more configuration from the user.
The other implementations require just a DATABASE_URL
, works fine with Supabase and other hosted providers, plus self hosted.
@egor-romanov sorry that I overlooked it.
As I see, this implementation ends up using postgREST and a PostgreSQL connection (
init_db
). Which requires more configuration from the user.The other implementations require just a
DATABASE_URL
, works fine with Supabase and other hosted providers, plus self hosted.
Hey, thanks. But it still is notably different, as init_db is not required and even discouraged to use :)
It is only to run migrations which is required only once and is much better to be done as a separate process.
While the actual work is done via postgREST requests and do not require any additional setup. You can find the same info in readme. In fact it requires much less, only url to supabase project and anon_key.
As I mentioned earlier both datastore providers would be nice additions to plugin as they work pretty much differently (using pg connection or using rest over http with postgREST).
Hey @mmmaia , supabase ceo here. You're right that we could provide a connection directly to the database.
Just to add a bit more color to our approach:
- For security, we advise our users to restrict direct access to the database from the internet and only expose access through the API (PostgREST).
- PostgREST has a built-in connection pooler. Without pooling, if 1000 users run a retrieval command concurrently it will crash the database (unless OpenAI is doing some sort of connection pooling on your side?). Postgres connections aren't so scalable unfortunately, and we often have users crashing their database when they connect directly in serverless environments
We can also push a generic Postgres retrieval plugin if you prefer, although it won't be as secure or scalable and will probably result in some support headaches for us
Hi, @isafulf!
Thanks for this amazing project, I just wanted to make a quick update about this PR:
- Datastore provider docs are moved according to a new structure
- Full local setup of supabase datastore provider now possible with new guide
- Added a few simple tests for upsert/query/delete
Apologies for the bluntness, but this comes across as a sort of cheap advert for supabase at the expense of pgvector users - an attempt to capture pgvector users into supabase ecosystem, without supporting the open source that supabase leverages. It's nice to support pgvector in supabase, but why not separate the pgvector support from the supabase support so that pure pgvector users who dont need a cloud solution can benefit too?
Supabase might benefit from users choosing pgvector use in that case too. Or figure out how to provide supabase support on top of #65 or #45 instead? One can well imagine that this project's owners do not want to support two different pgvector implementations.
@mtesch-um you should check pr and comments above to understand why this PR exists :)
But I can answer all these misunderstandings:
Apologies for the bluntness, but this comes across as a sort of cheap advert for supabase at the expense of pgvector users - an attempt to capture pgvector users into supabase ecosystem, without supporting the open source that supabase leverages.
This extension is called Supabase, not PgVector. And it benefits from supabase setup features like PostgREST, which is not possible when using just Postgres only.
It's nice to support pgvector in supabase, but why not separate the pgvector support from the supabase support so that pure pgvector users who dont need a cloud solution can benefit too?
See the comment from kiwicopple above to understand what are benefits of using postgrest, for example. Plus you can check out docs in this PR to see how to setup everything locally or self-host, so you don't need cloud, everything is still open-sourced and self-hostable.
Supabase might benefit from users choosing pgvector use in that case too. Or figure out how to provide supabase support on top of #65 or #45 instead? One can well imagine that this project's owners do not want to support two different pgvector implementations.
I am making this PR understanding that I am responsible for its maintenance.
This is not a race of any kind. I am super happy, that there is pure postgres db only extension and I don't want to compete with it in any way. If you need postgres only, this is good. If you want to create a project using supabase, this is also good, and this extension uses supabase features such as PostgREST and python dependencies for it.
If this is still not an argument, just take a look at the existing providers: Zilliz and Milvus to be specific.