parquet-go
parquet-go copied to clipboard
Add more examples for reading parquet files
I am looking for more guidance around how to read parquet files, especially when reading dynamic parquet files without a corresponding Go struct to read the data into. I was looking at some of the unit tests to see examples, but wasn't able to find many that didn't use a Go struct to read data into first before accessing values.
So far I've got the following:
reader := parquet.NewGenericReader[any](bytes.NewReader(data))
defer reader.Close()
schema := reader.Schema()
rows := make([]parquet.Row, reader.NumRows())
n, err := reader.ReadRows(rows)
if err != nil && !errors.Is(err, io.EOF) {
return err
}
for _, row := range rows {
values := parquet.Value(row)
// known columns can be accessed using
schema.Lookup([]string{"id"})
// and nested columns like
schema.Lookup([]string{"a", "b"})
}
Is that the best way to read them? Are there already existing methods to determine if one column path is a subset of another?