parquet-go icon indicating copy to clipboard operation
parquet-go copied to clipboard

Add more examples for reading parquet files

Open cmackenzie1 opened this issue 1 year ago • 1 comments

I am looking for more guidance around how to read parquet files, especially when reading dynamic parquet files without a corresponding Go struct to read the data into. I was looking at some of the unit tests to see examples, but wasn't able to find many that didn't use a Go struct to read data into first before accessing values.

So far I've got the following:

reader := parquet.NewGenericReader[any](bytes.NewReader(data))
defer reader.Close()

schema := reader.Schema()

rows := make([]parquet.Row, reader.NumRows())
n, err := reader.ReadRows(rows)
if err != nil && !errors.Is(err, io.EOF) {
    return err
}

for _, row := range rows {
        values := parquet.Value(row)
        // known columns can be accessed using 
        schema.Lookup([]string{"id"}) 
        // and nested columns like 
        schema.Lookup([]string{"a", "b"})
}

Is that the best way to read them? Are there already existing methods to determine if one column path is a subset of another?

cmackenzie1 avatar Mar 17 '23 04:03 cmackenzie1