parquet-wasm
parquet-wasm copied to clipboard
Improve documentation around calling `.free`
Reproduction
import { tableFromJSON, tableToIPC } from "apache-arrow";
import * as Parquet from "parquet-wasm";
// Sample data
const testData = [
{ id: 1, name: "John" },
{ id: 2, name: "Jane" },
];
// Create an Arrow table from the test data
const arrowTable = tableFromJSON(testData);
console.log(arrowTable);
// Create a Parquet Table from the Arrow table
const wasmTable = Parquet.Table.fromIPCStream(tableToIPC(arrowTable, "stream"));
console.log(wasmTable);
// Write the Parquet table to a buffer
const writerProperties = new Parquet.WriterPropertiesBuilder().build();
const parquetData = Parquet.writeParquet(wasmTable, writerProperties);
// Attempt to free the Parquet Table
wasmTable.free();
Output
tsx json-parquet-2.ts
Table {
schema: Schema {
fields: [ [Field], [Field] ],
metadata: Map(0) {},
dictionaries: Map(1) { 0 => [Utf8] },
metadataVersion: 4
},
batches: [ RecordBatch { schema: [Schema], data: [Data] } ],
_offsets: Uint32Array(2) [ 0, 2 ]
}
Table { __wbg_ptr: 2369000 }
/Users/drewbitt/Repos/Pantomath/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:3359
throw new Error(getStringFromWasm0(arg0, arg1));
^
Error: null pointer passed to rust
at module.exports.__wbindgen_throw (/Users/drewbitt/Repos/x/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:3359:11)
at wasm://wasm/014c002a:wasm-function[6573]:0x405d03
at wasm://wasm/014c002a:wasm-function[6574]:0x405d10
at wasm://wasm/014c002a:wasm-function[3297]:0x3a06de
at wasm://wasm/014c002a:wasm-function[4074]:0x3c8bd1
at Table.free (/Users/drewbitt/Repos/x/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:2095:14)
at <anonymous> (/Users/drewbitt/Repos/x/benchmarking/json-parquet-2.ts:23:11)
at Object.<anonymous> (/Users/drewbitt/Repos/x/benchmarking/json-parquet-2.ts:23:16)
at Module._compile (node:internal/modules/cjs/loader:1376:14)
at Object.S (/Users/drewbitt/.local/share/mise/installs/npm-tsx/4.7.1/lib/node_modules/tsx/dist/cjs/index.cjs:1:1292)
Node.js v20.11.0
I'm not very well aligned in this space, so let me know if this is expected for some reason. Thanks!
Yeah... this part can be confusing. The tl;dr is that writeParquet frees the table itself. We should probably clarify this in the function's docstring
Functions exported from rust through wasm-bindgen can either take inputs by reference or by value, and the latter consumes the input object. Here, writeParquet takes the input table by value, and so consumes its data.
You can always check the __wbg_ptr property of a wasm object to check whether the data has been freed or not. If the pointer is 0, it's a null pointer and the data has already been freed.
> let wasm = require('parquet-wasm/node')
> let properties = new wasm.WriterPropertiesBuilder().build()
undefined
> properties.__wbg_ptr
2621480
> properties.free()
undefined
> properties.__wbg_ptr
0
Thank you! That was helpful
I think adding that to the docstring and not erroring when this happens - stopping all execution - would be nice to have. A console.warn would be more suitable.
not erroring when this happens - stopping all execution - would be nice to have
That's not something I can control. That's part of the auto-generated bindings by rust's wasm-bindgen.
Let's keep this open as a reminder to improve the documentation here