Initial Bulk Loading Big Database With Indexing
Hi!
I am relying on the library to load some data in a PWA app which bulk loads large amount of data using the COPY command (with .csv files). I have a specific table i.e big_table which has roughly 120k rows and I have a pg_trgm index on it for fuzzy matching on the client side.
CREATE INDEX IF NOT EXISTS trgm_idx_slide_big_table
ON big_table
USING GIST ((
coalesce(remove_diacritics(content), '') || ' ' ||
coalesce(remove_diacritics(header), '')) gist_trgm_ops );
The current problem is that indexing alone takes around the same amount of time of bulk importing the data. The data import takes around ~10s and the Indexing takes around ~14s.
The whole database is around ~130 MB. Is there any other efficient way that I could use for faster insertion initially?
Hey @bishoyroufael could you please benchmark it against a PostgreSQL installation on the same/similar machine?
PostgreSQL locally works much quicker than using pglite in the browser. I am actually using pglite by loading it dynamically in my react-native app (i.e since there's no official support for it now in react-native) using:
const fetchedWasm = await fetch(wasmAsset.localUri!);
const wasmModule = await WebAssembly.compileStreaming(fetchedWasm);
const client = new PGlite({
// debug: 5,
wasmModule,
fsBundle: await fetchedFsBundle.blob(),
dataDir: "idb://hymnos-pgdata",
extensions: { pg_trgm },
relaxedDurability: true,
});
Should I even expect that pglite give the same performance as running PostgreSQL locally?
Should I even expect that
pglitegive the same performance as running PostgreSQL locally?
Definitely not, but we are always interested in making it faster and having some real-life apps benchmarks helps us with this.
dataDir: "idb://hymnos-pgdata",
This is most probably one cause of the slowness that you are seeing. We have ideas on how to improve this, but it is not a priority at the moment.
Wouldn't relaxedDurability: true eliminate the the slowness of indexedDB in the browser since persistence occurs asynchronously?
Partially. Could you try it without persistence just to get an idea?