trino icon indicating copy to clipboard operation
trino copied to clipboard

Precompute hash values for connector provided distributions

Open sopel39 opened this issue 2 years ago • 5 comments

Currently, HashGenerationOptimizer.Rewriter#visitExchange has a constraint:

// Currently, precomputed hash values are only supported for system hash distributions without constants

It would be beneficial to precompute hash values for connector distributions too (e.g. buckets)

sopel39 avatar Sep 19 '22 09:09 sopel39

cc @gaurav8297 @raunaqmorarka

sopel39 avatar Sep 19 '22 09:09 sopel39

That would be beneficial if we use these hashes more than once. Indeed we do -- for distributing data between the nodes and then within the connector page sink. We maybe also distribute between page sinks on a node, if task writer count > 1?

Anyway, for reuse within connector page sink, what would be a proposed SPI to represent that?

findepi avatar Sep 19 '22 10:09 findepi

We maybe also distribute between page sinks on a node, if task writer count > 1?

That's the idea

Anyway, for reuse within connector page sink, what would be a proposed SPI to represent that?

I don't think a new SPI is needed. I think we just need to use connector bucket function to compute hash (and possibly mangle the hash at remote exchange and local exchange level)

sopel39 avatar Sep 20 '22 10:09 sopel39

Without a SPI change we still will have hash/partitioning inside the connector. See example in Iceberg https://github.com/trinodb/trino/blob/6310e5e1415fe0bc89444016fc516640c3a3fcbb/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergPageSink.java#L266 This should re-use the precomputed hash values.

findepi avatar Sep 20 '22 10:09 findepi

Without a SPI change we still will have hash/partitioning inside the connector. See example in Iceberg

page indexer is a different matter as page indexer is local to sink (it enumerates "seen" partitions). It might consume hash, but the biggest cost in page indexer are hash lookups. There is another issue to improve page indexer, see: https://github.com/trinodb/trino/issues/14183

sopel39 avatar Sep 20 '22 11:09 sopel39