velox
velox copied to clipboard
Add Spark connect proto files
Add Spark connect proto files used by Spark query runner. https://github.com/facebookincubator/velox/pull/9559
Deploy Preview for meta-velox canceled.
| Name | Link |
|---|---|
| Latest commit | d5a7c5d49cb8e607ec020b1eb968cb4d88eac128 |
| Latest deploy log | https://app.netlify.com/sites/meta-velox/deploys/66c2ba2c19efcb0008dd34ff |
@mbasmanova Can we add the proto files first to make https://github.com/facebookincubator/velox/pull/9559 easier for review? Thanks. The Spark aggregate fuzzer test with Spark as reference is able to run for 1 hour on CI jobs. https://github.com/facebookincubator/velox/actions/runs/9739822472/job/26875737110?pr=9559
@rui-mo Rui, thank you for working on this. Curious, if running Fuzzer against Spark helped discover any bugs. Also, curious which version of Spark do we run against and how did we choose it.
It makes sense to break up a big change into smaller pieces. For this change, I see it adds a number of .proto files, but these files are not compiled or used anywhere. Wondering if it makes sense to add SparkQueryRunner + test as one change and wire it up into Fuzzer as another.
I assume it is a bit not straightforward to run Fuzzer against Spark. It would be nice to include documentation that explains how to do that and how this works overall.
What do you think?
@mbasmanova Thank you for providing the clear guidance. I will follow your suggestion to separate this work and add documentation.
Curious, if running Fuzzer against Spark helped discover any bugs. Also, curious which version of Spark do we run against and how did we choose it.
One bug on sum(real) was found and has been fixed by https://github.com/facebookincubator/velox/commit/d12fa90be87e66260f7b9289a54b4d9c4d0151ff. We are running against Spark v3.5.1, because it is the latest release of Spark.
@mbasmanova Would you like to take a review again? This PR covers SparkQueryRunner & test & documentation. Thanks.
@mbasmanova @majetideepak @assignUser @kgpai Thanks for your review. Would you like to take another look?
@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@kgpai Thanks for the review. This PR adds SparkQueryRunner + test and another PR will wire it up into the Fuzzer test.
@xiaoxmeng @kgpai Thanks for your help.
Hi @rui-mo , Would it be possible to use an older version of c-ares (1.13) for this pupose ? We are dependent on this particular version and i will be able to land this much faster if we can use this version , else I will have to work with other dependent teams to upgrade this to 1.17
Would it be possible to use an older version of c-ares (1.13) for this pupose ?
@kgpai I will take a look. Thanks.
@kgpai Updated in https://github.com/facebookincubator/velox/pull/10357/commits/98398971b70a185fc70c289d26740268de38ce52 to use c-ares (1.13). A patch from https://github.com/c-ares/c-ares/commit/fa903fd7bb124226e2832fc071ba241653cfe49c is applied to solve its compilation failure. Would you take another look? Thanks.
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.
@kgpai A new commit https://github.com/facebookincubator/velox/pull/10357/commits/d5a7c5d49cb8e607ec020b1eb968cb4d88eac128 has been added to work with d23dde5aa2bec01b7788af8455e49e81f50e6cc7. This PR might need to be re-imported. Thanks for your help.
@kgpai merged this pull request in facebookincubator/velox@82bad79442b80047981fde5e00fd8122f167e43c.
Conbench analyzed the 1 benchmark run on commit 82bad794.
There were no benchmark performance regressions. 🎉
The full Conbench report has more details.
@rui-mo @kgpai
could we add new build options like VELOX_ENABLE_SPARK_RUNNER to control downloading related deps?
now, absl, c-ares, grpc, spark will be downloaded when VELOX_ENABLE_TESTING was enabled, but many user does not really want to.
@Yohahaha In the next step, we will use the Spark query runner as the reference database for the Spark fuzzer test, so I assume if we need to run the fuzzer, the Spark query runner is needed. But it's fine to me to add an option like 'VELOX_ENABLE_FUZZER' if you think it's necessary @kgpai. Thanks.
Presto fuzzer previously brought in some extra dependencies as well. We should have a CMake option to enable all fuzzers VELOX_ENABLE_FUZZERS.
I opened a PR here for this https://github.com/facebookincubator/velox/pull/10802
I opened a PR here for this #10802
@majetideepak Thanks!