Wakahisa

Results 42 comments of Wakahisa

What could work in the interim is to use DataFusion's in-memory datasource (https://docs.rs/datafusion/2.0.0/datafusion/datasource/memory/index.html). When we have async-support on Parquet, then we can change to the relevant methods.

I'll play around with this in the December break; I've been wanting to write a few blog posts about using Rust for data eng; this could make for a good...

Hey @aldanor, saw your `fast-float` crate, then on your GH profile I noticed that you opened an issue on this repo. Development has moved to https://github.com/apache/arrow, where the `parquet` crate...

Thanks @sadikovi, I was confused by the UTC stuff on the timestamp logical type. Writing a timestamp now works with `message schema {REQUIRED INT64 MyField (TIMESTAMP_MILLIS)}`, but I'm unable to...

This is unrelated, but I've seen that CSV readers are being implemented in Arrow (I think Python, C++, and there's a Go one that's an open PR). BurntSushi's `rust-csv` got...

Yes, I've been following the Rust impl in Arrow. When I'm ready, I'll ask about it in the mailing list before opening a JIRA (didn't see one). The extension to...

Thanks @sadikovi, for number 4, I forgot that when modifying a parquet, you indeed rewrite a new file. On number 3, a csv use-case is simpler because I won't have...

I've made some progress with generating a schema from inspecting a sample of csv values. An easier write API would be great, as right now I don't know how to...

@xrl, I haven't gotten there. with nulls. Here's my code to read a csv with strings and integers https://gist.github.com/nevi-me/443025fe11038e2709083db2e24a5e64 I can do trial & error for other field types. Not...

The solution here might be to bump the Arrow version. This project seems to still use 1.0.1, and LZ4 compression seems to have been enabled by default in Arrow version...