pandas pull requests for .to_avro/.read_avro are welcome!
thanks @mariusvniekerk
This has a whole bunch of c deps and no windows support.
Pretty easy to build all of the dependencies as conda packages
Where does pandas stand on optional dependencies for top level apis? Is that okay to add in to pandas?
ok for something like this.
we bundled the c-deps in-line for msgpack, but that was reasonably small. So that's an option (at some point).
conda only is also ok as well. This is a purely optional feature, if people want to use it then they need to install the deps (or use conda, which they should be anyhow).
biggest question I would have is, is their a standard-ish schema already out there for dataframe type stuff? (so even though I ended up creating an internal one for msgpack, better to hijack an existing one I think).
I have a converter function that will infer a schema for a given dataframe. Should work for a reasonable amount of types.
Non-primitive classes are not supported atm. Its probably not really something that makes a lot of sense in anycase.
gr8!
yeh, that all sounds good.
The only types that are problematic in a generic sense are timestamps.
Avro does not provide a native timestamp type so these are just converted to Long (unix epoch milliseconds). We can easily add some metadata to the avro header for ease of preserving these types when read using pandas. Other systems though would just see Long