parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

Dynamically defined parquet schema

Open loicalleyne opened this issue 3 years ago • 2 comments

With the recent PR merge enabling the use of map[string]interface{}, the remaining challenge is the schema. Does anyone have recommendations on how to create a parquet schema with nested fields dynamically? I've seen implementations like dataframe-go that use github.com/ompluscator/dynamic-struct to dynamically create a struct, also looked into dynamically building a JSON schema, both seem rather cumbersome if there are more than a handful of data types and many/complex nested fields to handle. I was hoping for something simpler like what's alluded to by stephane-moreau in

loicalleyne avatar Dec 09 '21 22:12 loicalleyne

I am chiming in here. I am looking for a way to define a schema without using arrow. I ended up writing a dynamic CSV schema. It works until one hits nested types.

sdressler avatar Jun 08 '22 16:06 sdressler

@sdressler I've written a package that allows you to generate an Arrow schema from an arbitrary map[string]interface{}, maybe it's helpful. github.com/loicalleyne/arrow_schemagen

loicalleyne avatar Jul 21 '23 02:07 loicalleyne