frostdb icon indicating copy to clipboard operation
frostdb copied to clipboard

L1 arrow compaction

Open thorfour opened this issue 1 year ago • 11 comments

It may be useful to have the option to compact L0 arrow records into L1 arrow records instead of Parquet.

thorfour avatar Apr 27 '23 15:04 thorfour

This may only be worth pursuing once the REE support changes are in FrostDB as well as the record sorting implementation https://github.com/apache/arrow/pull/34719 is completed

thorfour avatar May 01 '23 21:05 thorfour

Agreed. I think moving to arrow-only in-mem would be the last step in this quarter.

asubiotto avatar May 02 '23 06:05 asubiotto

I am thinking about this, I was wondering if this is the same as arrowutils.MergeRecords(arrow_parts...) |> arrowutils.SortRecord |> parts.NewArrowPart ?

gernest avatar Dec 18 '23 01:12 gernest

Yes, although given the arrow parts should be merged on input, there probably isn't a need for the downstream sort. I'd also be interested in getting some L0 to L1 stats on how much memory we reduce through arrow compaction vs parquet compaction.

asubiotto avatar Dec 18 '23 09:12 asubiotto

@asubiotto can you expand a bit about memory expectation between arrow/parquet compaction ?

I was always under the impression parquet+compression gives better memory saving than arrow.

gernest avatar Dec 20 '23 04:12 gernest

Yes, this is why I'd be interested in getting some numbers so we are informed about the tradeoffs. Intuitively, dictionary encoding should go a long way. We've also been thinking about experimenting with run end encoding in arrow.

asubiotto avatar Dec 20 '23 08:12 asubiotto

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Jan 20 '24 01:01 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Feb 20 '24 01:02 github-actions[bot]

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Mar 22 '24 01:03 github-actions[bot]

I think it's still useful to keep this open.

asubiotto avatar Apr 15 '24 06:04 asubiotto

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar May 16 '24 01:05 github-actions[bot]