arrow-tools
arrow-tools copied to clipboard
A collection of handy CLI tools to convert CSV and JSON to Apache Arrow and Parquet
Hi @domoritz As others have mentioned, these tools are really powerful. Thanks for the great work. I'd like to add these to the [scoop](https://scoop.sh) repositories. Scoop is a very convenient...
I tried the new releases but get an error. ``` > csv2arrow data/simple.csv -n Schema: { "fields": [ { "name": "a", "data_type": "Int64", "nullable": true, "dict_id": 0, "dict_is_ordered": false, "metadata":...
Hi there! First of all thank you for the tooling, it's incredibly powerful. I have been using `json2parquet` to process some intricate `.jsonl` files. I have had a good time...
Thanks for making these tools. They are great. Would help non-Rustaceans to have schema examples for nontrivial types: Decimal128, Dictionary etc
See https://github.com/domoritz/json2parquet/issues/99 by @cardi
Some basic CI testing would be great to prevent regressions.
It's safest to infer the schema on the entire dataset. When the dataset is larger than RAM, this is currently not possible via stdin as the implementation in #10 and...
Not sure I am doing this right, but I am trying to convert a CSV containing some timestamp to a parquet file. Sample CSV ``` 072e4a64-2ffb-437c-9458-4953abaa7a20,1,2023-01-18 23:05:10,104,-1,0 072e4a64-2ffb-437c-9458-4953abaa7a20,2,2023-01-18 23:05:10,104,-1,0 072e4a64-2ffb-437c-9458-4953abaa7a20,4,2023-01-18...