node-postgres PGVector integration

It would be nice to be able to send an embedding vector directly to PostgreSQL without having to serialize it into a string first.

// Example with Transformers.js library and PGVector extension
const transformers = await import('@xenova/transformers');
const similarityEmbedder = await transformers.pipeline('feature-extraction', 'Xenova/paraphrase-MiniLM-L6-v2');
const result = await similarityEmbedder(text, { pooling: 'mean', normalize: true });

const embedding = '[' + [...result.data].join(',') + ']';   // Please get rid of this step so we can send the Float32Array directly 

await client.query(
   `INSERT INTO post_embedding (post_id, embedding) 
   VALUES ($1, $2)`,
   [postID, embedding]);

Dec 12 '24 06:12 BjornMoren

I would like to see this feature as well

Feb 21 '25 14:02 dannyb101

definitely happy to work on this! if you can submit a self-contained snippet of code that works w/o needing to call any 3rd party APIs I can probably whip this out extremely quickly.

Feb 27 '25 21:02 brianc

Sounds great brianc. Will the below work for you?

The embedding column below is of type Vector.

// const transformers = await import('@xenova/transformers');
// const similarityEmbedder = await transformers.pipeline('feature-extraction', 'Xenova/paraphrase-MiniLM-L6-v2');
// const result = await similarityEmbedder("The cat sat on a mat.", { pooling: 'mean', normalize: true });
// const embedding = result.data;

// Simulated 384 vector embedding from Xenova/paraphrase-MiniLM-L6-v2 
const embedding = new Float32Array(384).map(() => Math.random() * 0.4 - 0.2);

await client.query(
   `INSERT INTO post_embedding (post_id, embedding) 
   VALUES ($1, $2)`,
   [postID, embedding]);

Feb 27 '25 21:02 BjornMoren

preferrably something I don't need to npm install - looks like that has a dependency on @xenova/transformers & I'm not sure what that does exactly. Basically I want to avoid taking deps on things when writing unit tests for the feature.

Feb 27 '25 21:02 brianc

No those lines are commented out.

const embedding = new Float32Array(384).map(() => Math.random() * 0.4 - 0.2);

await client.query(
   `INSERT INTO post_embedding (post_id, embedding) 
   VALUES ($1, $2)`,
   [postID, embedding]);

Feb 27 '25 21:02 BjornMoren

ohhh nice - i see i see. what about the create temp table statement for the query? Would be useful to know the types of the columns. The idea is this needs to run in CI - so the more greased we can make the rails for me to dig into it the easier I can turn it around.

Feb 27 '25 21:02 brianc

All good?

CREATE TABLE IF NOT EXISTS public.post_embedding
(
    post_id integer NOT NULL,
    embedding vector(384) NOT NULL
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.post_embedding
    OWNER to postgres;

Feb 27 '25 21:02 BjornMoren

aye ty - i'll take a look at this soon!

On Thu, Feb 27, 2025 at 6:48 PM Bjorn Moren @.***> wrote:

All good?

CREATE TABLE IF NOT EXISTS public.post_embedding ( post_id integer NOT NULL, embedding vector(384) NOT NULL )

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.post_embedding OWNER to postgres;

— Reply to this email directly, view it on GitHub https://github.com/brianc/node-postgres/issues/3351#issuecomment-2689183836, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMHIJ6RXVJXUXEFNNE46T2R6B3JAVCNFSM6AAAAABTPBQPYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZGE4DGOBTGY . You are receiving this because you commented.Message ID: @.***> [image: BjornMoren]BjornMoren left a comment (brianc/node-postgres#3351) https://github.com/brianc/node-postgres/issues/3351#issuecomment-2689183836

All good?

CREATE TABLE IF NOT EXISTS public.post_embedding ( post_id integer NOT NULL, embedding vector(384) NOT NULL )

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.post_embedding OWNER to postgres;

— Reply to this email directly, view it on GitHub https://github.com/brianc/node-postgres/issues/3351#issuecomment-2689183836, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMHIJ6RXVJXUXEFNNE46T2R6B3JAVCNFSM6AAAAABTPBQPYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZGE4DGOBTGY . You are receiving this because you commented.Message ID: @.***>

Feb 27 '25 21:02 brianc

Sorry, I should have mentioned that your PG installation must have the PGVector extension installed, or the column type "vector" is not defined.

https://github.com/pgvector/pgvector

Feb 27 '25 21:02 BjornMoren

hmmm this is not straightforward & I'm not sure there's a backwards compatible way to do this actually. I just tried but it doesn't work because node-postgres already automatically converts ArrayBuffers (and typed arrays) to be compatible with the BYTEA column type (binary). You can try mapping your typed array to a normal javascript array of numbers....that might work?

Feb 27 '25 22:02 brianc

I can't get that to work unfortunately.

What my example above shows is a standard operation with LLMs (Large Language Models), which are becoming very popular due to the explosion of AI. They process large amounts of data. In this case a vector of 384 dimensions, but other more popular models have 2000 or even more dimensions. So this kind of stuff is coming more and more and requires an efficient way to do this.

On the other hand, when high performance is needed then normally you move away from doing these operations in Node and instead install a proper LLM server. I guess it is your call if this is something you think your PG library should have.

Feb 27 '25 22:02 BjornMoren