pglite
pglite copied to clipboard
Client-side pglite struggles with large (~5000 row) import
Hi! Trying to push pglite to its limits, a bit—I want to do hybrid search (w/ embeddings + full-text) all client-side in the browser, on a ~5000 row table. The table has an 384-dim embedding and fts index on each, and I have tested this works. The real issue is wrangling initialization. The database lives in Supabase right now —I didn't want to bother with converting formats, so I just downloaded the full JSON, and imported row by row to pglite (takes ~2 mins) like this:
for (const row of rows) {
const { parent_issue_id, page_number, ocr_result, embedding, image_url } = row;
if (parent_issue_id && page_number && ocr_result) {
await db.query(
`INSERT INTO page (parent_issue_id, page_number, ocr_result, embedding, image_url)
VALUES ($1, $2, $3, $4, $5)`,
[parent_issue_id, page_number, ocr_result, embedding, image_url]
);
}
}
};
My thought was this would be a quick stop-gap/MacGyver-y solution: do this once, save the pglite compatible tarball, and serve this (50MB) by CDN for users, which will make actual search very, very fast. My issue is that, while I am having no problem saving a tarball, I can't manage to quickly import it.
My import code has (for testing) consisted of a file input and then some logic to instantiate a new db.
const newDb = new PGlite('idb://supa-semantic-search', {
loadDataDir: dumpFile,
extensions: {
vector,
uuid_ossp
},
debug: 5
});
The file seems to go through, etc, the problem is that I hang for several minutes (never gotten to completion!) on await newDb.waitReady;. It sounded like dev/blob was another option, but it's less immediately suitable for me because I'm using two tables (it's for a magazine: page and issue). This also proved more difficult than anticipated to get working. I guess one way I could do this would be to lock clients out for ~2 minutes and do that really slow manual loading, but this also seems quite sub-ideal. I followed the https://pglite.dev/examples/dump-data-dir example pretty much to a tee and still no luck. Happy to send over code and data to reproduce if helpful, there may also be some better way to do this?