chatgpt-retrieval-plugin icon indicating copy to clipboard operation
chatgpt-retrieval-plugin copied to clipboard

add supabase and postgres + pgvector datastore providers

Open egor-romanov opened this issue 1 year ago • 8 comments

What kind of change does this PR introduce?

  • [x] add supabase as a datastore provider
  • [x] add pure postgres as a datastore provider
  • [x] add necessary migrations with
    • [x] Postgres function to search closest vectors using pgvector extension
    • [x] table to store embeddings
    • [x] enable pgvector postgres extension
  • [x] update readme
  • [x] add local setup guide for supabase datastore provider and postgres datastore provider
  • [x] added simple tests for both

Hey, OpenAI team 👋 thanks for this great service template and plugin!

egor-romanov avatar Mar 26 '23 13:03 egor-romanov

Supabase is just one of the hosted PostgreSQL options that supports pgvector. There are options listed here.

A user of this project could also choose to use their own PostgreSQL instance with pgvector enabled.

I believe in a more abstract implementation and the README/docs can mention the hosted providers.

mmmaia avatar Mar 26 '23 23:03 mmmaia

Supabase is just one of the hosted PostgreSQL options that supports pgvector. There are options listed here.

A user of this project could also choose to use their own PostgreSQL instance with pgvector enabled.

I believe in a more abstract implementation and the README/docs can mention the hosted providers.

One does not replace another. This extension uses postgREST to access database. Both datasources: just postgres and Supabase would be great additions to this project :)

egor-romanov avatar Mar 27 '23 06:03 egor-romanov

@egor-romanov sorry that I overlooked it.

As I see, this implementation ends up using postgREST and a PostgreSQL connection (init_db). Which requires more configuration from the user.

The other implementations require just a DATABASE_URL, works fine with Supabase and other hosted providers, plus self hosted.

mmmaia avatar Mar 27 '23 15:03 mmmaia

@egor-romanov sorry that I overlooked it.

As I see, this implementation ends up using postgREST and a PostgreSQL connection (init_db). Which requires more configuration from the user.

The other implementations require just a DATABASE_URL, works fine with Supabase and other hosted providers, plus self hosted.

Hey, thanks. But it still is notably different, as init_db is not required and even discouraged to use :)

It is only to run migrations which is required only once and is much better to be done as a separate process.

While the actual work is done via postgREST requests and do not require any additional setup. You can find the same info in readme. In fact it requires much less, only url to supabase project and anon_key.

As I mentioned earlier both datastore providers would be nice additions to plugin as they work pretty much differently (using pg connection or using rest over http with postgREST).

egor-romanov avatar Mar 27 '23 15:03 egor-romanov

Hey @mmmaia , supabase ceo here. You're right that we could provide a connection directly to the database.

Just to add a bit more color to our approach:

  • For security, we advise our users to restrict direct access to the database from the internet and only expose access through the API (PostgREST).
  • PostgREST has a built-in connection pooler. Without pooling, if 1000 users run a retrieval command concurrently it will crash the database (unless OpenAI is doing some sort of connection pooling on your side?). Postgres connections aren't so scalable unfortunately, and we often have users crashing their database when they connect directly in serverless environments

We can also push a generic Postgres retrieval plugin if you prefer, although it won't be as secure or scalable and will probably result in some support headaches for us

kiwicopple avatar Mar 29 '23 14:03 kiwicopple

Hi, @isafulf!

Thanks for this amazing project, I just wanted to make a quick update about this PR:

  1. Datastore provider docs are moved according to a new structure
  2. Full local setup of supabase datastore provider now possible with new guide
  3. Added a few simple tests for upsert/query/delete

egor-romanov avatar Apr 14 '23 20:04 egor-romanov

Apologies for the bluntness, but this comes across as a sort of cheap advert for supabase at the expense of pgvector users - an attempt to capture pgvector users into supabase ecosystem, without supporting the open source that supabase leverages. It's nice to support pgvector in supabase, but why not separate the pgvector support from the supabase support so that pure pgvector users who dont need a cloud solution can benefit too?

Supabase might benefit from users choosing pgvector use in that case too. Or figure out how to provide supabase support on top of #65 or #45 instead? One can well imagine that this project's owners do not want to support two different pgvector implementations.

mtesch-um avatar May 05 '23 15:05 mtesch-um

@mtesch-um you should check pr and comments above to understand why this PR exists :)

But I can answer all these misunderstandings:

Apologies for the bluntness, but this comes across as a sort of cheap advert for supabase at the expense of pgvector users - an attempt to capture pgvector users into supabase ecosystem, without supporting the open source that supabase leverages.

This extension is called Supabase, not PgVector. And it benefits from supabase setup features like PostgREST, which is not possible when using just Postgres only.

It's nice to support pgvector in supabase, but why not separate the pgvector support from the supabase support so that pure pgvector users who dont need a cloud solution can benefit too?

See the comment from kiwicopple above to understand what are benefits of using postgrest, for example. Plus you can check out docs in this PR to see how to setup everything locally or self-host, so you don't need cloud, everything is still open-sourced and self-hostable.

Supabase might benefit from users choosing pgvector use in that case too. Or figure out how to provide supabase support on top of #65 or #45 instead? One can well imagine that this project's owners do not want to support two different pgvector implementations.

I am making this PR understanding that I am responsible for its maintenance.

This is not a race of any kind. I am super happy, that there is pure postgres db only extension and I don't want to compete with it in any way. If you need postgres only, this is good. If you want to create a project using supabase, this is also good, and this extension uses supabase features such as PostgREST and python dependencies for it.

If this is still not an argument, just take a look at the existing providers: Zilliz and Milvus to be specific.

egor-romanov avatar May 05 '23 20:05 egor-romanov