frostdb
frostdb copied to clipboard
L1 arrow compaction
It may be useful to have the option to compact L0 arrow records into L1 arrow records instead of Parquet.
This may only be worth pursuing once the REE support changes are in FrostDB as well as the record sorting implementation https://github.com/apache/arrow/pull/34719 is completed
Agreed. I think moving to arrow-only in-mem would be the last step in this quarter.
I am thinking about this, I was wondering if this is the same as arrowutils.MergeRecords(arrow_parts...) |> arrowutils.SortRecord |> parts.NewArrowPart
?
Yes, although given the arrow parts should be merged on input, there probably isn't a need for the downstream sort. I'd also be interested in getting some L0 to L1 stats on how much memory we reduce through arrow compaction vs parquet compaction.
@asubiotto can you expand a bit about memory expectation between arrow/parquet compaction ?
I was always under the impression parquet+compression
gives better memory saving than arrow
.
Yes, this is why I'd be interested in getting some numbers so we are informed about the tradeoffs. Intuitively, dictionary encoding should go a long way. We've also been thinking about experimenting with run end encoding in arrow.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
I think it's still useful to keep this open.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.