pglite for testing - memory consumption issues?
I love idea of using pglite for testing, since i can configure my app to run without any running dependencies. I just init pglite in memory instead of docker pg instance which simplifies my cicd, local run etc etc
buut i'm hitting some issues, my app is pretty large monolith with ~30 test files and each file has ~10 tests..
initally i was naively creating a pglite for each test (beforeEach) but this destroyed my ram... then i moved to a model where i create one pglite, run migration and prepare some inital data and save that as "golden instance" then for each test run i do pglite.clone() to get a clone version.. this still kills my computer in terms of memory, so next optimization would be to use the same pglite for each test inside a test file... but this breaks the logic where each test does not affect the test before or after it...
what are your tips for using pglite in cicd for large test codebases? maybe some examples from the wild ?
Hi +1 here because I'm currently working on this.
- I tried using
pglite-tools/dumpbut it did not write the users/roles that I needed with the DB as well - I tried using
clonebut it does not preserve the extensions - I might be able to use
dumpDataDir
Update:
Since clone is just a wrap on dumpDataDir and create, we should allow for some optional params.
async clone(): Promise<PGliteInterface> {
const dump = await this.dumpDataDir('none')
return PGlite.create({ loadDataDir: dump })
}
freshPgLite = await PGlite.create({
loadDataDir: goldenDataDirDump,
extensions: {
uuid_ossp,
},
});
this starts taking 5+sec in 200+ test system where each test loads their own clean pglite instance
Dealing with the same challenge - many tests in a file, each test wants a copy of a clean database. But there is something in PGLite that prevents parallel initialization, so even if the tests run in parallel, they all get stuck waiting to get a copy of their database, which happens sequentially and slows down the test suite.
Here's a simple example - both "parallel" and "sequential" flows take the same time to finish (~2.5s for me)
import { PGlite } from "@electric-sql/pglite";
const protoDb = await PGlite.create();
// ... here we would seed the prototype DB with the schema + data required for tests
const dataDir = await protoDb.dumpDataDir("none"); // save the golden image to re-use in tests
const iterations = Array(20).fill(0).map((_, idx) => idx);
console.time("Promise.all");
await Promise.all(iterations.map(runTest)); // "Parallel" execution ends up being sequential - each call waits longer and longer on previous calls
console.timeEnd("Promise.all");
// Regular sequential execution
console.time("sequential");
for (const idx of iterations) await runTest(idx);
console.timeEnd("sequential");
async function runTest(testId) {
console.time(`test(${testId})`);
await PGlite.create({ loadDataDir: dataDir });
console.timeEnd(`test(${testId})`);
}
The only solution I found so far is to split up my large test files to have fewer tests per file (so they run in different processes), then the entire suite speeds up.
yeah pglite is single conenction so even if you do paralel in js, under the hood is sequential.. you should spawn multiple PGLite instnaces and then use each one for the paralel thing
what i'd do:
- init pglite, run migration, seed test data, dump into file
- init multiple paralel pglites read that file for init and use for tests
my issue is that i feel that pglite.close() is not working, so after test is done memory usage is still there and eventually crashes me if i have too many tests.
right now i run sequntially reusing same pglite, and having each test define its cleanup script in the afterEach()
you should spawn multiple PGLite instances and then use each one for the parallel thing [...] init multiple paralel pglites read that file for init and use for tests
That's what my example does - each "test" creates a new in-memory PGLite instance. However, the process of creating a new PGlite instance is sequential (with all other calls to PGlite.create), not parallel (as shown in my example above). This becomes an issue when my test file has 10 independent tests firing off in parallel, but in reality, each test has to wait in the queue to get its database (the wait time grows linearly with the number of tests/databases), so they end up executing in parallel.
mmmm xD i'm retarded didn't see the last few rows haahaa