timescaledb icon indicating copy to clipboard operation
timescaledb copied to clipboard

[Enhancement]: Improve Numeric Compression

Open bkief opened this issue 3 years ago • 4 comments

What type of enhancement is this?

Performance

What subsystems and features will be improved?

Compression

What does the enhancement do?

Compression of numeric types could be stored as integers and the scale & precision information stored as column metadata. This would allow the more efficient delta-of-delta compression to be used rather than the lz-array compression currently used for numeric types. Numeric precisions too large to store as a 64bit int could default back to the lz-array compression.

Implementation challenges

This is similar to how parquet encoding treats its DECIMAL type, that could be used as a reference. The hardest part would likely be the reconstruction of the numeric type after decompressing.

bkief avatar Jan 06 '22 17:01 bkief

2ndQuadrant (now EDB) has implemented a similar datatype that may be helpful as reference material https://github.com/2ndQuadrant/fixeddecimal

bkief avatar Jan 11 '22 18:01 bkief

Int64 should allow for 18 significant digits of precision. NaN could be stored in the dead space that is >10^18, like int64.max It's also notable the PG15 will likely support numerics with negative scale or scale larger than the precision. This enhancement should preemptively support these numeric scales

bkief avatar Jan 28 '22 05:01 bkief

What is the current state of NUMERIC compression in timescale? I know it exists, but beyond that, what can we reasonably expect performance and space-wise when compressing columnar data? I'm guessing it's not quite as good as float since this issue exists?

ianthetechie avatar Jun 28 '22 07:06 ianthetechie

@svenklemm @erimatnor - Does this feature more feasible with the recent compression API enhancements?

bkief avatar Feb 27 '24 14:02 bkief