velox icon indicating copy to clipboard operation
velox copied to clipboard

Fix Presto Java UUID serialization

Open BryanCutler opened this issue 1 year ago • 4 comments

This fixes the PrestoSerializer to put UUID values in the correct format that is expected by Presto Java so that the values will match those from a Java worker. First, when converting UUID to/from string, the values are no longer in big endian format (as taken from boost::uuid) and are instead stored as a little endian in an int128_t. Secondly, Presto Java will read UUID values from an Int128ArrayBlock with the first value as the most significant bits. To correct this, the upper/lower parts of the int128_t are swapped during serialization/deserialization.

A unit test for checking roundtrip UUID serializaiton was added and manual testing of Presto with a native worker to verify the problem from the issue description is fixed.

From https://github.com/prestodb/presto/issues/23311

BryanCutler avatar Oct 08 '24 20:10 BryanCutler

Deploy Preview for meta-velox canceled.

Name Link
Latest commit bf000b0c87f96aa94312b6b1e52129ef88c681c1
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/673d76747b8ff600082c98c2

netlify[bot] avatar Oct 08 '24 20:10 netlify[bot]

please review @aditi-pandit @Yuhta @mbasmanova

BryanCutler avatar Oct 08 '24 20:10 BryanCutler

Thanks @Yuhta , but folly::Endian::big is not defined for int128 types. It's based off of compiler built-in bytes swapping functions, so it looks like it could be added for GCC variants, but there is no built-in function for MSVC. I could add a check to these changes to only swap if the host system is LE, wdyt?

BryanCutler avatar Oct 09 '24 22:10 BryanCutler

@BryanCutler You can add a utility to DecimalUtil to do that. We can do it with 2 folly::Endian::bigs if needed.

Yuhta avatar Oct 16 '24 17:10 Yuhta

Status: I've added DecimalUtil::big() to swap bytes for int128_t, but there is an active PR at https://github.com/prestodb/presto/pull/23847 which changes UUID format on the Presto Java side. There would need to be an additional adjustment to the changes here to compensate if the Java UUID PR is merged.

BryanCutler avatar Nov 04 '24 19:11 BryanCutler

I have updated to work with the latest changes from https://github.com/prestodb/presto/pull/23847. The serialization makes sure Presto Java receives values according to the format described in https://github.com/prestodb/presto/pull/23961#issuecomment-2460371370

I have manually tested these changes with Presto Java following the error description from https://github.com/prestodb/presto/issues/23311 and confirmed the values are now displayed correctly.

@Yuhta please review again, thanks.

BryanCutler avatar Nov 06 '24 22:11 BryanCutler

Thanks for reviewing @aditi-pandit . What do you think about me working on additional fuzzer tests as a followup if the rest of this PR is ready? I did add a serialization round trip test that goes through a full range of values, but I do think improving the fuzzer to use UUIDs sounds like a good addition.

BryanCutler avatar Nov 13 '24 01:11 BryanCutler

Thanks for reviewing @aditi-pandit . What do you think about me working on additional fuzzer tests as a followup if the rest of this PR is ready? I did add a serialization round trip test that goes through a full range of values, but I do think improving the fuzzer to use UUIDs sounds like a good addition.

Thats fine @BryanCutler. Thanks !

aditi-pandit avatar Nov 13 '24 07:11 aditi-pandit

@Yuhta please review, thanks

BryanCutler avatar Nov 13 '24 19:11 BryanCutler

@Yuhta please review, thanks!

BryanCutler avatar Nov 19 '24 17:11 BryanCutler

I missed a format fix, could you try again @Yuhta ?

BryanCutler avatar Nov 19 '24 22:11 BryanCutler

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Nov 19 '24 23:11 facebook-github-bot

@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Nov 20 '24 18:11 facebook-github-bot

@kagamiori merged this pull request in facebookincubator/velox@fe4f5a77b90a7f98c1d90832fb6b018a46f7365d.

facebook-github-bot avatar Nov 21 '24 01:11 facebook-github-bot

Conbench analyzed the 1 benchmark run on commit fe4f5a77.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

conbench-facebook[bot] avatar Nov 21 '24 01:11 conbench-facebook[bot]