velox
velox copied to clipboard
Fix Presto Java UUID serialization
This fixes the PrestoSerializer to put UUID values in the correct format that is expected by Presto Java so that the values will match those from a Java worker. First, when converting UUID to/from string, the values are no longer in big endian format (as taken from boost::uuid) and are instead stored as a little endian in an int128_t. Secondly, Presto Java will read UUID values from an Int128ArrayBlock with the first value as the most significant bits. To correct this, the upper/lower parts of the int128_t are swapped during serialization/deserialization.
A unit test for checking roundtrip UUID serializaiton was added and manual testing of Presto with a native worker to verify the problem from the issue description is fixed.
From https://github.com/prestodb/presto/issues/23311
Deploy Preview for meta-velox canceled.
| Name | Link |
|---|---|
| Latest commit | bf000b0c87f96aa94312b6b1e52129ef88c681c1 |
| Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/673d76747b8ff600082c98c2 |
please review @aditi-pandit @Yuhta @mbasmanova
Thanks @Yuhta , but folly::Endian::big is not defined for int128 types. It's based off of compiler built-in bytes swapping functions, so it looks like it could be added for GCC variants, but there is no built-in function for MSVC. I could add a check to these changes to only swap if the host system is LE, wdyt?
@BryanCutler You can add a utility to DecimalUtil to do that. We can do it with 2 folly::Endian::bigs if needed.
Status: I've added DecimalUtil::big() to swap bytes for int128_t, but there is an active PR at https://github.com/prestodb/presto/pull/23847 which changes UUID format on the Presto Java side. There would need to be an additional adjustment to the changes here to compensate if the Java UUID PR is merged.
I have updated to work with the latest changes from https://github.com/prestodb/presto/pull/23847. The serialization makes sure Presto Java receives values according to the format described in https://github.com/prestodb/presto/pull/23961#issuecomment-2460371370
I have manually tested these changes with Presto Java following the error description from https://github.com/prestodb/presto/issues/23311 and confirmed the values are now displayed correctly.
@Yuhta please review again, thanks.
Thanks for reviewing @aditi-pandit . What do you think about me working on additional fuzzer tests as a followup if the rest of this PR is ready? I did add a serialization round trip test that goes through a full range of values, but I do think improving the fuzzer to use UUIDs sounds like a good addition.
Thanks for reviewing @aditi-pandit . What do you think about me working on additional fuzzer tests as a followup if the rest of this PR is ready? I did add a serialization round trip test that goes through a full range of values, but I do think improving the fuzzer to use UUIDs sounds like a good addition.
Thats fine @BryanCutler. Thanks !
@Yuhta please review, thanks
@Yuhta please review, thanks!
I missed a format fix, could you try again @Yuhta ?
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@kagamiori has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@kagamiori merged this pull request in facebookincubator/velox@fe4f5a77b90a7f98c1d90832fb6b018a46f7365d.
Conbench analyzed the 1 benchmark run on commit fe4f5a77.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details.