Wakahisa comments

Results 42 comments of


Wakahisa

Datafusion integration assumes table's data files are local

What could work in the interim is to use DataFusion's in-memory datasource (https://docs.rs/datafusion/2.0.0/datafusion/datasource/memory/index.html). When we have async-support on Parquet, then we can change to the relevant methods.

Investigate possibility of WASM target

I'll play around with this in the December break; I've been wanting to write a few blog posts about using Rust for data eng; this could make for a good...

Ergonomics idea: closure/RAII-based writer access

Hey @aldanor, saw your `fast-float` crate, then on your GH profile I noticed that you opened an issue on this repo. Development has moved to https://github.com/apache/arrow, where the `parquet` crate...

Writing Dates and Timestamps

Thanks @sadikovi, I was confused by the UTC stuff on the timestamp logical type. Writing a timestamp now works with `message schema {REQUIRED INT64 MyField (TIMESTAMP_MILLIS)}`, but I'm unable to...

Add support for reading columns as Apache Arrow arrays

This is unrelated, but I've seen that CSV readers are being implemented in Arrow (I think Python, C++, and there's a Go one that's an open PR). BurntSushi's `rust-csv` got...

Add support for reading columns as Apache Arrow arrays

Yes, I've been following the Rust impl in Arrow. When I'm ready, I'll ask about it in the mailing list before opening a JIRA (didn't see one). The extension to...

CSV to Parquet

Thanks @sadikovi, for number 4, I forgot that when modifying a parquet, you indeed rewrite a new file. On number 3, a csv use-case is simpler because I won't have...

CSV to Parquet

I've made some progress with generating a schema from inspecting a sample of csv values. An easier write API would be great, as right now I don't know how to...

CSV to Parquet

@xrl, I haven't gotten there. with nulls. Here's my code to read a csv with strings and integers https://gist.github.com/nevi-me/443025fe11038e2709083db2e24a5e64 I can do trial & error for other field types. Not...

Apache Arrow featherV2 read function

The solution here might be to bump the Arrow version. This project seems to still use 1.0.1, and LZ4 compression seems to have been enabled by default in Arrow version...