[Bug]: Invalid Input Error: File '[...].parquet' too small to be a Parquet file
Describe the bug
evidence build crashes when evidence sources earlier said: Finished, wrote 0 rows.
Steps to Reproduce
Use Postgres connector as source, with SQL which will return 0 rows.
Create Page which uses this source.
Run evidence sources and evidence build.
Logs
$ evidence sources
✔ Loading plugins & sources
-----
[Processing] xxx
yyy ✔ Finished, wrote 0 rows.
-----
Evaluated sources, saving manifest
✅ Done!
$ evidence build
vite v5.2.10 building SSR bundle for production...
✓ 1429 modules transformed.
node:internal/event_target:1094
process.nextTick(() => { throw err; });
^
Error: Invalid Input Error: File 'xxx_yyy.parquet' too small to be a Parquet file
at Et.runQuery (/evidence/node_modules/@evidence-dev/universal-sql/node_modules/@duckdb/duckdb-wasm/dist/duckdb-node-blocking.cjs:76:82930)
at dr.query (/evidence/node_modules/@evidence-dev/universal-sql/node_modules/@duckdb/duckdb-wasm/dist/duckdb-node-blocking.cjs:76:70474)
at setParquetURLs (file:///evidence/node_modules/@evidence-dev/universal-sql/src/client-duckdb/node.js:124:15)
at async loadDB (file:///evidence/workspace/.evidence/template/.svelte-kit/output/server/entries/pages/_layout.js:33:5)
Emitted 'error' event on Worker instance at:
at [kOnErrorMessage] (node:internal/worker:326:10)
at [kOnMessage] (node:internal/worker:337:37)
at MessagePort.<anonymous> (node:internal/worker:232:57)
at [nodejs.internal.kHybridDispatch] (node:internal/event_target:820:20)
at MessagePort.<anonymous> (node:internal/per_context/messageport:23:28)
Node.js v20.17.0
Build failed
System Info
System:
OS: Linux 6.1 Debian GNU/Linux 12 (bookworm) 12 (bookworm)
CPU: (20) x64 13th Gen Intel(R) Core(TM) i5-13500
Memory: 57.97 GB / 62.60 GB
Container: Yes
Shell: 5.2.15 - /bin/bash
Binaries:
Node: 20.17.0 - ~usr/local/bin/node
Yarn: 1.22.22 - ~usr/local/bin/yarn
npm: 10.8.2 - ~usr/local/bin/npm
npmPackages:
@evidence-dev/core-components: ^4.7.5 => 4.7.5
@evidence-dev/evidence: ^39.1.4 => 39.1.4
@evidence-dev/postgres: ^1.0.6 => 1.0.6
Severity
blocking all usage of Evidence
Additional Information, or Workarounds
Previous versions of evidence (using plugin-connector instead of the new sdk implementation) used to output this instead during evidence sources:
test1 ⚠ Finished. 0 rows, did not create table
A similar check seems to be in place here: https://github.com/evidence-dev/evidence/blob/b9bc16480ca273a826dc228d0050a4e6a354fc95/packages/lib/sdk/src/plugins/datasources/evalSources.js#L141-L144
But this does not trigger, because table.rows is [Function (anonymous)] in my case (AFAICT it could also be an Array?).
Now, because buildMultipartParquet from https://github.com/evidence-dev/evidence/blob/next/packages/lib/universal-sql/src/build-parquet.js returns early when no rows are present, no parquet file is written!
-> Changing buildMultipartParquet to always write a parquet-file – in this case: without rows (which will have filesize > 0) –, as I've proposed before: https://github.com/evidence-dev/evidence/issues/1850, would actually also fix this issue.
During evidence build, when the missing parquet file is (tried to be) queried for SSR, the aforementioned Error occurs.
Also, interestingly, a zero-sized file at .evidence/template/static/data/xxx/yyy/yyy.parquet (which is usually created during the sources-step) seems to be created only later during the build-step...
One workaround that's working for us:
Generate the full dataset (no filtering) with npm run sources. Then use a parquet tool, like the pyarrow package, to filter the full dataset into a filtered dataset for each tenant.
# For each parquet file to filter:
table = pq.read_table(file, filters=[('filter_col', '=', filter_val)])
pq.write_table(table, dest_file)