drizzle-orm icon indicating copy to clipboard operation
drizzle-orm copied to clipboard

[FEATURE]: add PgVector support

Open ianjdarrow opened this issue 1 year ago • 12 comments

Support for PGVector, a popular extension for vectors and vector comparisons

AI projects are quite popular right now and often involve working with vector databases. The PgVector extension (https://github.com/pgvector/pgvector) has pg, sequelize, and prisma support, but currently none for drizzle-orm.

I spent a while trying to apply the custom types example to add PgVector support, but haven't been able to get it working yet. If anyone with more familiarity is able to take a look at it, it would unblock a use case for me, and I suspect many others!

Thank you to the team for your work on drizzle-orm :)

ianjdarrow avatar May 08 '23 20:05 ianjdarrow

Update: progress! The following type definition works, at least for CRUD operations.

I don't use drizzle-kit so have not tested it with table generation.

Posting here for comments and improvements, and in case useful to others. Happy to submit the type if it would be useful to folks.

import { customType } from "drizzle-orm/pg-core";

export const vector = customType<{
  data: number[];
  driverData: string;
  config: { size: number };
}>({
  dataType(config) {
    const dt =
      !!config && typeof config.size === "number"
        ? `vector(${config.size})`
        : "vector";
    return dt;
  },
  fromDriver(value) {
    return JSON.parse(value);
  },
  toDriver(value) {
    return JSON.stringify(value);
  },
});

ianjdarrow avatar May 09 '23 17:05 ianjdarrow

Thanks @ianjdarrow! I added experimental support to pgvector-node (but it's not released yet). I tried using both customType as well as PgColumn/PgColumnBuilder, but wasn't able to get either working with Drizzle Kit. If someone is able to figure it out, please submit a PR.

ankane avatar Jun 01 '23 02:06 ankane

What is the status on pgvector in drizzle on this day?

m10rten avatar Aug 06 '23 10:08 m10rten

Any updates? Thanks

danielreuter avatar Nov 08 '23 03:11 danielreuter

Generated columns should have type vector(dimensions), not "vector(dimensions)"

I think it need to fix the drizzle-kit but I cannot access the source :/

cometkim avatar Dec 11 '23 07:12 cometkim

So I've removed the double quotes from the vector(1536) in my migration.sql file, but it still doesn't want to push it to the database. I'm getting the error: type "vector(1536)" does not exist error. (using drizzle-kit)

I was able to push it when changing it from vector(1536) to vector.

Is it a bad idea to store a vector as a vector instead of as a vector(1536)?

mattiasbonte avatar Dec 23 '23 20:12 mattiasbonte

Until drizzle-kit properly supports non-native pg types, you can patch drizzle-kit yourself to add e.g. vector support.

This assumes you are using pnpm to manage your dependencies and also that you're running the latest drizzle-kit version.

Path: ./patches/[email protected]

diff --git a/bin.cjs b/bin.cjs
index 371b26746840f9b5da1b63f763dd96cc569e7713..a9c498a6ab9e0e449c4a871a1788d138aaa24f87 100755
--- a/bin.cjs
+++ b/bin.cjs
@@ -15346,7 +15346,7 @@ var init_sqlgenerator = __esm({
       if (pgNativeTypes.has(it))
         return true;
       const toCheck = it.replace(/ /g, "");
-      return toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);
+      return toCheck.startsWith("vector(") || toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);
     };
     Convertor = class {
     };
diff --git a/utils.js b/utils.js
index 3322ed988c7752ddc3c46f051db77960b5903bf5..f8c9ba642383b7e469446801509a88b78f2ef613 100644
--- a/utils.js
+++ b/utils.js
@@ -15063,7 +15063,7 @@ var init_sqlgenerator = __esm({
       if (pgNativeTypes.has(it))
         return true;
       const toCheck = it.replace(/ /g, "");
-      return toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);
+      return toCheck.startsWith("vector(") || toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);
     };
     Convertor = class {
     };

Then add this to you package.json

  "pnpm": {
    "patchedDependencies": {
      "[email protected]": "patches/[email protected]"
    }
  }

You can use pnpm patch if you need to fix this for any other version of drizzle-kit.

matthiasbayer avatar Jan 12 '24 09:01 matthiasbayer

Until drizzle-kit properly supports non-native pg types, you can patch drizzle-kit yourself to add e.g. vector support.

This assumes you are using pnpm to manage your dependencies and also that you're running the latest drizzle-kit version.

Path: ./patches/[email protected]


diff --git a/bin.cjs b/bin.cjs

index 371b26746840f9b5da1b63f763dd96cc569e7713..a9c498a6ab9e0e449c4a871a1788d138aaa24f87 100755

--- a/bin.cjs

+++ b/bin.cjs

@@ -15346,7 +15346,7 @@ var init_sqlgenerator = __esm({

       if (pgNativeTypes.has(it))

         return true;

       const toCheck = it.replace(/ /g, "");

-      return toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);

+      return toCheck.startsWith("vector(") || toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);

     };

     Convertor = class {

     };

diff --git a/utils.js b/utils.js

index 3322ed988c7752ddc3c46f051db77960b5903bf5..f8c9ba642383b7e469446801509a88b78f2ef613 100644

--- a/utils.js

+++ b/utils.js

@@ -15063,7 +15063,7 @@ var init_sqlgenerator = __esm({

       if (pgNativeTypes.has(it))

         return true;

       const toCheck = it.replace(/ /g, "");

-      return toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);

+      return toCheck.startsWith("vector(") || toCheck.startsWith("varchar(") || toCheck.startsWith("char(") || toCheck.startsWith("numeric(") || toCheck.startsWith("timestamp(") || toCheck.startsWith("intervalyear(") || toCheck.startsWith("intervalmonth(") || toCheck.startsWith("intervalday(") || toCheck.startsWith("intervalhour(") || toCheck.startsWith("intervalminute(") || toCheck.startsWith("intervalsecond(") || toCheck.startsWith("intervalyeartomonth(") || toCheck.startsWith("intervaldaytohour(") || toCheck.startsWith("intervaldaytominute(") || toCheck.startsWith("intervaldaytosecond(") || toCheck.startsWith("intervalhourtominute(") || toCheck.startsWith("intervalhourtosecond(") || toCheck.startsWith("intervalminutetosecond(") || /^(\w+)(\[\d*])+$/.test(it);

     };

     Convertor = class {

     };



Then add this to you package.json


  "pnpm": {

    "patchedDependencies": {

      "[email protected]": "patches/[email protected]"

    }

  }



You can use pnpm patch if you need to fix this for any other version of drizzle-kit.

Possible to expand a bit how to apply this for those of us who are not quite as familiar with patch?

RobertHH-IS avatar Jan 12 '24 18:01 RobertHH-IS

Damn, back to Primsa ;)

homerjam avatar Jan 20 '24 00:01 homerjam

I'm not seeing nor hearing about any pgvector updates from Drizzle. So yes, Prisma might win this one (for now).

m10rten avatar Jan 20 '24 08:01 m10rten

Hi Drizzle ORM Team, any updates one vector support?

neilord avatar Apr 15 '24 20:04 neilord

fwiw they are working on it, Andrew posted a screenshot on twitter that included showed a pgvector index in drizzle, as well as other vector stuff - so it seems like it's a work-in-progress.

K-Mistele avatar May 07 '24 17:05 K-Mistele