pglite icon indicating copy to clipboard operation
pglite copied to clipboard

Client-side pglite struggles with large (~5000 row) import

Open lucasgelfond opened this issue 8 months ago • 8 comments
trafficstars

Hi! Trying to push pglite to its limits, a bit—I want to do hybrid search (w/ embeddings + full-text) all client-side in the browser, on a ~5000 row table. The table has an 384-dim embedding and fts index on each, and I have tested this works. The real issue is wrangling initialization. The database lives in Supabase right now —I didn't want to bother with converting formats, so I just downloaded the full JSON, and imported row by row to pglite (takes ~2 mins) like this:

	for (const row of rows) {
		const { parent_issue_id, page_number, ocr_result, embedding, image_url } = row;
		if (parent_issue_id && page_number && ocr_result) {
			await db.query(
				`INSERT INTO page (parent_issue_id, page_number, ocr_result, embedding, image_url) 
				VALUES ($1, $2, $3, $4, $5)`,
				[parent_issue_id, page_number, ocr_result, embedding, image_url]
			);
		}
	}
};

My thought was this would be a quick stop-gap/MacGyver-y solution: do this once, save the pglite compatible tarball, and serve this (50MB) by CDN for users, which will make actual search very, very fast. My issue is that, while I am having no problem saving a tarball, I can't manage to quickly import it.

My import code has (for testing) consisted of a file input and then some logic to instantiate a new db.

		const newDb = new PGlite('idb://supa-semantic-search', {
			loadDataDir: dumpFile,
			extensions: {
				vector,
				uuid_ossp
			},
			debug: 5
		});

The file seems to go through, etc, the problem is that I hang for several minutes (never gotten to completion!) on await newDb.waitReady;. It sounded like dev/blob was another option, but it's less immediately suitable for me because I'm using two tables (it's for a magazine: page and issue). This also proved more difficult than anticipated to get working. I guess one way I could do this would be to lock clients out for ~2 minutes and do that really slow manual loading, but this also seems quite sub-ideal. I followed the https://pglite.dev/examples/dump-data-dir example pretty much to a tee and still no luck. Happy to send over code and data to reproduce if helpful, there may also be some better way to do this?

lucasgelfond avatar Mar 12 '25 03:03 lucasgelfond