parquet-wasm icon indicating copy to clipboard operation
parquet-wasm copied to clipboard

Improve documentation around calling `.free`

Open drewbitt opened this issue 1 year ago • 4 comments

Reproduction

import { tableFromJSON, tableToIPC } from "apache-arrow";
import * as Parquet from "parquet-wasm";

// Sample data
const testData = [
  { id: 1, name: "John" },
  { id: 2, name: "Jane" },
];

// Create an Arrow table from the test data
const arrowTable = tableFromJSON(testData);
console.log(arrowTable);

// Create a Parquet Table from the Arrow table
const wasmTable = Parquet.Table.fromIPCStream(tableToIPC(arrowTable, "stream"));
console.log(wasmTable);

// Write the Parquet table to a buffer
const writerProperties = new Parquet.WriterPropertiesBuilder().build();
const parquetData = Parquet.writeParquet(wasmTable, writerProperties);

// Attempt to free the Parquet Table
wasmTable.free();
Output

tsx json-parquet-2.ts

Table {
  schema: Schema {
    fields: [ [Field], [Field] ],
    metadata: Map(0) {},
    dictionaries: Map(1) { 0 => [Utf8] },
    metadataVersion: 4
  },
  batches: [ RecordBatch { schema: [Schema], data: [Data] } ],
  _offsets: Uint32Array(2) [ 0, 2 ]
}
Table { __wbg_ptr: 2369000 }
/Users/drewbitt/Repos/Pantomath/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:3359
    throw new Error(getStringFromWasm0(arg0, arg1));
          ^

Error: null pointer passed to rust
    at module.exports.__wbindgen_throw (/Users/drewbitt/Repos/x/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:3359:11)
    at wasm://wasm/014c002a:wasm-function[6573]:0x405d03
    at wasm://wasm/014c002a:wasm-function[6574]:0x405d10
    at wasm://wasm/014c002a:wasm-function[3297]:0x3a06de
    at wasm://wasm/014c002a:wasm-function[4074]:0x3c8bd1
    at Table.free (/Users/drewbitt/Repos/x/benchmarking/node_modules/parquet-wasm/node/parquet_wasm.js:2095:14)
    at <anonymous> (/Users/drewbitt/Repos/x/benchmarking/json-parquet-2.ts:23:11)
    at Object.<anonymous> (/Users/drewbitt/Repos/x/benchmarking/json-parquet-2.ts:23:16)
    at Module._compile (node:internal/modules/cjs/loader:1376:14)
    at Object.S (/Users/drewbitt/.local/share/mise/installs/npm-tsx/4.7.1/lib/node_modules/tsx/dist/cjs/index.cjs:1:1292)

Node.js v20.11.0

I'm not very well aligned in this space, so let me know if this is expected for some reason. Thanks!

drewbitt avatar Apr 25 '24 06:04 drewbitt

Yeah... this part can be confusing. The tl;dr is that writeParquet frees the table itself. We should probably clarify this in the function's docstring

Functions exported from rust through wasm-bindgen can either take inputs by reference or by value, and the latter consumes the input object. Here, writeParquet takes the input table by value, and so consumes its data.

You can always check the __wbg_ptr property of a wasm object to check whether the data has been freed or not. If the pointer is 0, it's a null pointer and the data has already been freed.

> let wasm = require('parquet-wasm/node')
> let properties = new wasm.WriterPropertiesBuilder().build()
undefined
> properties.__wbg_ptr
2621480
> properties.free()
undefined
> properties.__wbg_ptr
0

kylebarron avatar Apr 25 '24 14:04 kylebarron

Thank you! That was helpful

I think adding that to the docstring and not erroring when this happens - stopping all execution - would be nice to have. A console.warn would be more suitable.

drewbitt avatar Apr 25 '24 15:04 drewbitt

not erroring when this happens - stopping all execution - would be nice to have

That's not something I can control. That's part of the auto-generated bindings by rust's wasm-bindgen.

kylebarron avatar Apr 25 '24 15:04 kylebarron

Let's keep this open as a reminder to improve the documentation here

kylebarron avatar Apr 25 '24 15:04 kylebarron