frostdb icon indicating copy to clipboard operation
frostdb copied to clipboard

Clarify that column names are sorted when schema is created.

Open joeblubaugh opened this issue 3 years ago • 5 comments

This set of column definitions:

[]dynparquet.ColumnDefinition{
{
	Name:          "timestamp",
	StorageLayout: parquet.Int(64),
	Dynamic:       false,
},
{
	Name:          "state",
	StorageLayout: parquet.String(),
	Dynamic:       false,
},
{
	Name:          "labels",
	StorageLayout: parquet.Encoded(parquet.Optional(parquet.String()), &parquet.RLEDictionary),
	Dynamic:       true,
},
}

will actually have column indexes: {labels: 0, state: 1, timestamp: 2}

So, when inserting fields, you must get the column indexes by searching the schema's column slice, rather than based on your own definition.

Documentation on functions and/or examples where this happens would be useful to new contributors.


Aside: what is the reason that column names are sorted on schema creation?

joeblubaugh avatar Jun 10 '22 06:06 joeblubaugh

We need to have them sorted so that when we merge two row groups with different dynamic columns we can perform a k-way merge. We’ve been thinking about improving the ergonomics of the insert API so that people don’t have to care about the index positions, but haven’t come up with anything satisfactory (yet).

Would you be willing to contribute some docs where it would have been helpful for you to understand that this sorting is happening?

brancz avatar Jun 10 '22 07:06 brancz

Sure thing. I'll send a PR when I get a bit of spare time. If GH lets you, assigning it to me would be great - then it'll appear in my standard GH searches that I use.

joeblubaugh avatar Jun 15 '22 05:06 joeblubaugh

Done!

brancz avatar Jun 15 '22 05:06 brancz

I know it doesn't improve the original issue, but I just want to point out here how Parca uses a switch-case to get the correct columns and then builds the index around that (the columnIndex). https://github.com/parca-dev/parca/blob/5bfbe7ebe340a8e0cfea28a2589117a83250744b/pkg/parcacol/sample.go#L128

metalmatze avatar Jun 16 '22 09:06 metalmatze

@metalmatze thanks, that's a helpful example.

joeblubaugh avatar Jun 21 '22 03:06 joeblubaugh

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jan 06 '24 01:01 github-actions[bot]