PGVector integration
It would be nice to be able to send an embedding vector directly to PostgreSQL without having to serialize it into a string first.
// Example with Transformers.js library and PGVector extension
const transformers = await import('@xenova/transformers');
const similarityEmbedder = await transformers.pipeline('feature-extraction', 'Xenova/paraphrase-MiniLM-L6-v2');
const result = await similarityEmbedder(text, { pooling: 'mean', normalize: true });
const embedding = '[' + [...result.data].join(',') + ']'; // Please get rid of this step so we can send the Float32Array directly
await client.query(
`INSERT INTO post_embedding (post_id, embedding)
VALUES ($1, $2)`,
[postID, embedding]);
I would like to see this feature as well
definitely happy to work on this! if you can submit a self-contained snippet of code that works w/o needing to call any 3rd party APIs I can probably whip this out extremely quickly.
Sounds great brianc. Will the below work for you?
The embedding column below is of type Vector.
// const transformers = await import('@xenova/transformers');
// const similarityEmbedder = await transformers.pipeline('feature-extraction', 'Xenova/paraphrase-MiniLM-L6-v2');
// const result = await similarityEmbedder("The cat sat on a mat.", { pooling: 'mean', normalize: true });
// const embedding = result.data;
// Simulated 384 vector embedding from Xenova/paraphrase-MiniLM-L6-v2
const embedding = new Float32Array(384).map(() => Math.random() * 0.4 - 0.2);
await client.query(
`INSERT INTO post_embedding (post_id, embedding)
VALUES ($1, $2)`,
[postID, embedding]);
preferrably something I don't need to npm install - looks like that has a dependency on @xenova/transformers & I'm not sure what that does exactly. Basically I want to avoid taking deps on things when writing unit tests for the feature.
No those lines are commented out.
const embedding = new Float32Array(384).map(() => Math.random() * 0.4 - 0.2);
await client.query(
`INSERT INTO post_embedding (post_id, embedding)
VALUES ($1, $2)`,
[postID, embedding]);
ohhh nice - i see i see. what about the create temp table statement for the query? Would be useful to know the types of the columns. The idea is this needs to run in CI - so the more greased we can make the rails for me to dig into it the easier I can turn it around.
All good?
CREATE TABLE IF NOT EXISTS public.post_embedding
(
post_id integer NOT NULL,
embedding vector(384) NOT NULL
)
TABLESPACE pg_default;
ALTER TABLE IF EXISTS public.post_embedding
OWNER to postgres;
aye ty - i'll take a look at this soon!
On Thu, Feb 27, 2025 at 6:48 PM Bjorn Moren @.***> wrote:
All good?
CREATE TABLE IF NOT EXISTS public.post_embedding ( post_id integer NOT NULL, embedding vector(384) NOT NULL )
TABLESPACE pg_default;
ALTER TABLE IF EXISTS public.post_embedding OWNER to postgres;
— Reply to this email directly, view it on GitHub https://github.com/brianc/node-postgres/issues/3351#issuecomment-2689183836, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMHIJ6RXVJXUXEFNNE46T2R6B3JAVCNFSM6AAAAABTPBQPYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZGE4DGOBTGY . You are receiving this because you commented.Message ID: @.***> [image: BjornMoren]BjornMoren left a comment (brianc/node-postgres#3351) https://github.com/brianc/node-postgres/issues/3351#issuecomment-2689183836
All good?
CREATE TABLE IF NOT EXISTS public.post_embedding ( post_id integer NOT NULL, embedding vector(384) NOT NULL )
TABLESPACE pg_default;
ALTER TABLE IF EXISTS public.post_embedding OWNER to postgres;
— Reply to this email directly, view it on GitHub https://github.com/brianc/node-postgres/issues/3351#issuecomment-2689183836, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMHIJ6RXVJXUXEFNNE46T2R6B3JAVCNFSM6AAAAABTPBQPYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBZGE4DGOBTGY . You are receiving this because you commented.Message ID: @.***>
Sorry, I should have mentioned that your PG installation must have the PGVector extension installed, or the column type "vector" is not defined.
https://github.com/pgvector/pgvector
hmmm this is not straightforward & I'm not sure there's a backwards compatible way to do this actually. I just tried but it doesn't work because node-postgres already automatically converts ArrayBuffers (and typed arrays) to be compatible with the BYTEA column type (binary). You can try mapping your typed array to a normal javascript array of numbers....that might work?
I can't get that to work unfortunately.
What my example above shows is a standard operation with LLMs (Large Language Models), which are becoming very popular due to the explosion of AI. They process large amounts of data. In this case a vector of 384 dimensions, but other more popular models have 2000 or even more dimensions. So this kind of stuff is coming more and more and requires an efficient way to do this.
On the other hand, when high performance is needed then normally you move away from doing these operations in Node and instead install a proper LLM server. I guess it is your call if this is something you think your PG library should have.