trino
trino copied to clipboard
Serde improvments
DictionaryBlockEncoding
:
- we don't need to serialize
ids
asintegers
. We can useshort
orbyte
if dictionary has fewer positions
VariableWidthBlockEncoding
- we don't need to serialize
offsets
asintegers
. We can useshort
ifrawSlice
is short enough.
cc @lukasz-stec
Hi, @sopel39.
I have some question about this issue. Please understand even if the question is stupid.
- (DictionaryBlockEncoding) In the case of ORC or Parquet, the spec of the element constituting ids is Unsigned Integer. Will there be a problem if it is changed to short or byte?
- (DictionaryBlockEncoding) Even if it is changed to a short or byte type, wouldn't deserialization performance decrease because 2 byte padding must be inserted in the middle of the slice composed of short/byte elements during the deserialization process?
I am interested in the issue, but I want to understand the exact context, so I ask this question.
(DictionaryBlockEncoding) In the case of ORC or Parquet, the spec of the element constituting ids is Unsigned Integer. Will there be a problem if it is changed to short or byte?
This problem is unrelated to either ORC or Parquet.
DictionaryBlockEncoding) Even if it is changed to a short or byte type, wouldn't deserialization performance decrease because 2 byte padding must be inserted in the middle of the slice composed of short/byte elements during the deserialization process?
It's more about reducing the size of payload. Less payload, less processing along the way => win even if CPU usage stays the same