velox
velox copied to clipboard
Add serializeColumn to PrestoVectorSerde
https://github.com/prestodb/presto/pull/23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). https://github.com/prestodb/presto/pull/22927 implements a proxygen endpoint to accept RowExpressions from NativeSidecarExpressionInterpreter, optimize them if possible (rewrite special form expressions), and compile the RowExpression to a velox expression with constant folding enabled. This velox expression is then converted back to a RowExpression and returned by the sidecar to the coordinator.
When the constant folded velox expression is of type velox::exec::ConstantExpr, we need to return a RowExpression of type ConstantExpression. This requires us to serialize the constant value from velox::exec::ConstantExpr into protocol::ConstantExpression::valueBlock. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from Base64Util.cpp::readBlock). The function serializeColumn can be used to serialize the constant value into the SerializedPage::column format, it is present in PrestoSerializer.cpp but is currently not exposed for use. This PR exposes the serializeColumn function via PrestoVectorSerde.
Deploy Preview for meta-velox canceled.
| Name | Link |
|---|---|
| Latest commit | 6e3d6dd1b7cc319f7c538b8ebb4e47c0651ede00 |
| Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/67358db55634190008ca6af1 |
@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well.
@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well.
Thanks @aditi-pandit, updated the PR description to explain why this change is needed.
@kevinwilfong : What do you think of these changes ? Please can you help with this review.
Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.
https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.h#L41-L43
Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.
https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.h#L41-L43
Hi @pedroerp, thanks for the suggestion. Initially I did try to use the serialize function in PrestoBatchVectorSerializer, which serializes the velox vector in PrestoPage format, to serialize the constant encoded vector from exec::ConstantExpr into protocol::Block::data (for constructing the protocol::ConstantExpression::valueBlock). However, this was found to result in a deserialization error on the Presto coordinator. Upon further investigation, we noticed this was because the Presto coordinator expects just the serialized column from PrestoPage, without the header, in the field protocol::ConstantExpression::valueBlock::data.
We also noticed that in PrestoToVeloxExpr.cpp, the valueBlock from Presto protocol::ConstantExpression is converted to velox vector by deserializing just the column from PrestoPage (without the header) in function VeloxExprConverter::getConstantValue() -> deserializeSingleColumn (we are attempting to do an inverse operation to this function when constructing a Presto ConstantExpression from velox ConstantExpr here).
Using serializeColumn to construct valueBlock in protocol::ConstantExpression fixed the error (ref: function RowExpressionEvaluator::getConstantRowExpression in RowExpressionEvaluator.cpp) and we were able to test it end to end. We were also able to solve the serialization error by just skipping over the fixed number of bytes occupied by the header in PrestoPage, but since there already is a serializeColumn function present in the codebase, we figured it would be better to expose it for use via velox.
Please let me know if I am missing something.
@pramodsatya thank you for clarifying. Could we instead provide an option in PrestoOptions to make the serializer skip the header instead?
I think what I'm slightly concerned is moving VectorStream and all that stuff to the API (header) of this class, making the client more verbose, and the API a bit less intuitive.
@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@pedroerp merged this pull request in facebookincubator/velox@00e814941312669889fa9f5417ac4ea63b6e2c79.
Conbench analyzed the 1 benchmark run on commit 00e81494.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details.