velox icon indicating copy to clipboard operation
velox copied to clipboard

Add serializeColumn to PrestoVectorSerde

Open pramodsatya opened this issue 1 year ago • 7 comments
trafficstars

https://github.com/prestodb/presto/pull/23331 adds a native expression optimizer to delegate expression evaluation to the native sidecar. This is used to constant fold expressions on the presto native sidecar, instead of on the presto java coordinator (which is the current behavior). https://github.com/prestodb/presto/pull/22927 implements a proxygen endpoint to accept RowExpressions from NativeSidecarExpressionInterpreter, optimize them if possible (rewrite special form expressions), and compile the RowExpression to a velox expression with constant folding enabled. This velox expression is then converted back to a RowExpression and returned by the sidecar to the coordinator.

When the constant folded velox expression is of type velox::exec::ConstantExpr, we need to return a RowExpression of type ConstantExpression. This requires us to serialize the constant value from velox::exec::ConstantExpr into protocol::ConstantExpression::valueBlock. This can be done by serializing the constant value vector to presto SerializedPage::column format, followed by base 64 encoding the result (reverse engineering the logic from Base64Util.cpp::readBlock). The function serializeColumn can be used to serialize the constant value into the SerializedPage::column format, it is present in PrestoSerializer.cpp but is currently not exposed for use. This PR exposes the serializeColumn function via PrestoVectorSerde.

pramodsatya avatar Aug 05 '24 08:08 pramodsatya

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 6e3d6dd1b7cc319f7c538b8ebb4e47c0651ede00
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/67358db55634190008ca6af1

netlify[bot] avatar Aug 05 '24 08:08 netlify[bot]

@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well.

aditi-pandit avatar Aug 15 '24 16:08 aditi-pandit

@pramodsatya : The changes look okay. But please can you explain what specifically in the linked PR needed these changes. I do recall our discussion. But it would be good for other reviewers as well.

Thanks @aditi-pandit, updated the PR description to explain why this change is needed.

pramodsatya avatar Aug 15 '24 17:08 pramodsatya

@kevinwilfong : What do you think of these changes ? Please can you help with this review.

aditi-pandit avatar Sep 10 '24 09:09 aditi-pandit

Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.

https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.h#L41-L43

pedroerp avatar Oct 24 '24 15:10 pedroerp

Hi @pramodsatya, not sure if I'm following. Why can't you just serialize the results of the expression (which are probably encoded as constant vectors) as PrestoPage using this API? It will maintain the encoding, then they can be deserialized by the Presto Java code.

https://github.com/facebookincubator/velox/blob/main/velox/serializers/PrestoSerializer.h#L41-L43

Hi @pedroerp, thanks for the suggestion. Initially I did try to use the serialize function in PrestoBatchVectorSerializer, which serializes the velox vector in PrestoPage format, to serialize the constant encoded vector from exec::ConstantExpr into protocol::Block::data (for constructing the protocol::ConstantExpression::valueBlock). However, this was found to result in a deserialization error on the Presto coordinator. Upon further investigation, we noticed this was because the Presto coordinator expects just the serialized column from PrestoPage, without the header, in the field protocol::ConstantExpression::valueBlock::data. We also noticed that in PrestoToVeloxExpr.cpp, the valueBlock from Presto protocol::ConstantExpression is converted to velox vector by deserializing just the column from PrestoPage (without the header) in function VeloxExprConverter::getConstantValue() -> deserializeSingleColumn (we are attempting to do an inverse operation to this function when constructing a Presto ConstantExpression from velox ConstantExpr here).

Using serializeColumn to construct valueBlock in protocol::ConstantExpression fixed the error (ref: function RowExpressionEvaluator::getConstantRowExpression in RowExpressionEvaluator.cpp) and we were able to test it end to end. We were also able to solve the serialization error by just skipping over the fixed number of bytes occupied by the header in PrestoPage, but since there already is a serializeColumn function present in the codebase, we figured it would be better to expose it for use via velox.

Please let me know if I am missing something.

pramodsatya avatar Oct 24 '24 19:10 pramodsatya

@pramodsatya thank you for clarifying. Could we instead provide an option in PrestoOptions to make the serializer skip the header instead?

I think what I'm slightly concerned is moving VectorStream and all that stuff to the API (header) of this class, making the client more verbose, and the API a bit less intuitive.

pedroerp avatar Oct 26 '24 01:10 pedroerp

@pedroerp has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Nov 16 '24 01:11 facebook-github-bot

@pedroerp merged this pull request in facebookincubator/velox@00e814941312669889fa9f5417ac4ea63b6e2c79.

facebook-github-bot avatar Nov 16 '24 03:11 facebook-github-bot

Conbench analyzed the 1 benchmark run on commit 00e81494.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

conbench-facebook[bot] avatar Nov 16 '24 04:11 conbench-facebook[bot]