trino Serde improvments

DictionaryBlockEncoding:

we don't need to serialize ids as integers. We can use short or byte if dictionary has fewer positions

VariableWidthBlockEncoding

we don't need to serialize offsets as integers. We can use short if rawSlice is short enough.

Sep 21 '22 08:09 sopel39

cc @lukasz-stec

Sep 21 '22 08:09 sopel39

Hi, @sopel39.

I have some question about this issue. Please understand even if the question is stupid.

(DictionaryBlockEncoding) In the case of ORC or Parquet, the spec of the element constituting ids is Unsigned Integer. Will there be a problem if it is changed to short or byte?
(DictionaryBlockEncoding) Even if it is changed to a short or byte type, wouldn't deserialization performance decrease because 2 byte padding must be inserted in the middle of the slice composed of short/byte elements during the deserialization process?

I am interested in the issue, but I want to understand the exact context, so I ask this question.

Jan 07 '23 04:01 leeyh0216

(DictionaryBlockEncoding) In the case of ORC or Parquet, the spec of the element constituting ids is Unsigned Integer. Will there be a problem if it is changed to short or byte?

This problem is unrelated to either ORC or Parquet.

DictionaryBlockEncoding) Even if it is changed to a short or byte type, wouldn't deserialization performance decrease because 2 byte padding must be inserted in the middle of the slice composed of short/byte elements during the deserialization process?

It's more about reducing the size of payload. Less payload, less processing along the way => win even if CPU usage stays the same

Jan 11 '23 10:01 sopel39

trino trino copied to clipboard

Serde improvments

trino
trino copied to clipboard