cyavro icon indicating copy to clipboard operation
cyavro copied to clipboard

pandas pull requests for .to_avro/.read_avro are welcome!

Open jreback opened this issue 10 years ago • 6 comments

thanks @mariusvniekerk

jreback avatar Dec 02 '15 20:12 jreback

This has a whole bunch of c deps and no windows support.

Pretty easy to build all of the dependencies as conda packages

Where does pandas stand on optional dependencies for top level apis? Is that okay to add in to pandas?

mariusvniekerk avatar Dec 02 '15 22:12 mariusvniekerk

ok for something like this.

we bundled the c-deps in-line for msgpack, but that was reasonably small. So that's an option (at some point).

conda only is also ok as well. This is a purely optional feature, if people want to use it then they need to install the deps (or use conda, which they should be anyhow).

biggest question I would have is, is their a standard-ish schema already out there for dataframe type stuff? (so even though I ended up creating an internal one for msgpack, better to hijack an existing one I think).

jreback avatar Dec 02 '15 22:12 jreback

I have a converter function that will infer a schema for a given dataframe. Should work for a reasonable amount of types.

Non-primitive classes are not supported atm. Its probably not really something that makes a lot of sense in anycase.

mariusvniekerk avatar Dec 02 '15 22:12 mariusvniekerk

gr8!

yeh, that all sounds good.

jreback avatar Dec 02 '15 22:12 jreback

The only types that are problematic in a generic sense are timestamps.

Avro does not provide a native timestamp type so these are just converted to Long (unix epoch milliseconds). We can easily add some metadata to the avro header for ease of preserving these types when read using pandas. Other systems though would just see Long

mariusvniekerk avatar Dec 02 '15 23:12 mariusvniekerk

ahh I see. yeh, really fully converting pandas types is a bit tricky actually (e.g. but see here for how the type conversions for msgpack are done.

We've also used a schema define here for JSON-y data.

jreback avatar Dec 02 '15 23:12 jreback