velox icon indicating copy to clipboard operation
velox copied to clipboard

Add Spark connect proto files

Open rui-mo opened this issue 1 year ago • 4 comments

Add Spark connect proto files used by Spark query runner. https://github.com/facebookincubator/velox/pull/9559

rui-mo avatar Jul 01 '24 07:07 rui-mo

Deploy Preview for meta-velox canceled.

Name Link
Latest commit d5a7c5d49cb8e607ec020b1eb968cb4d88eac128
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66c2ba2c19efcb0008dd34ff

netlify[bot] avatar Jul 01 '24 07:07 netlify[bot]

@mbasmanova Can we add the proto files first to make https://github.com/facebookincubator/velox/pull/9559 easier for review? Thanks. The Spark aggregate fuzzer test with Spark as reference is able to run for 1 hour on CI jobs. https://github.com/facebookincubator/velox/actions/runs/9739822472/job/26875737110?pr=9559

rui-mo avatar Jul 02 '24 02:07 rui-mo

@rui-mo Rui, thank you for working on this. Curious, if running Fuzzer against Spark helped discover any bugs. Also, curious which version of Spark do we run against and how did we choose it.

It makes sense to break up a big change into smaller pieces. For this change, I see it adds a number of .proto files, but these files are not compiled or used anywhere. Wondering if it makes sense to add SparkQueryRunner + test as one change and wire it up into Fuzzer as another.

I assume it is a bit not straightforward to run Fuzzer against Spark. It would be nice to include documentation that explains how to do that and how this works overall.

What do you think?

mbasmanova avatar Jul 02 '24 10:07 mbasmanova

@mbasmanova Thank you for providing the clear guidance. I will follow your suggestion to separate this work and add documentation.

Curious, if running Fuzzer against Spark helped discover any bugs. Also, curious which version of Spark do we run against and how did we choose it.

One bug on sum(real) was found and has been fixed by https://github.com/facebookincubator/velox/commit/d12fa90be87e66260f7b9289a54b4d9c4d0151ff. We are running against Spark v3.5.1, because it is the latest release of Spark.

rui-mo avatar Jul 02 '24 13:07 rui-mo

@mbasmanova Would you like to take a review again? This PR covers SparkQueryRunner & test & documentation. Thanks.

rui-mo avatar Jul 16 '24 14:07 rui-mo

@mbasmanova @majetideepak @assignUser @kgpai Thanks for your review. Would you like to take another look?

rui-mo avatar Aug 02 '24 12:08 rui-mo

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 08 '24 17:08 facebook-github-bot

@kgpai Thanks for the review. This PR adds SparkQueryRunner + test and another PR will wire it up into the Fuzzer test.

rui-mo avatar Aug 09 '24 00:08 rui-mo

@xiaoxmeng @kgpai Thanks for your help.

rui-mo avatar Aug 14 '24 02:08 rui-mo

Hi @rui-mo , Would it be possible to use an older version of c-ares (1.13) for this pupose ? We are dependent on this particular version and i will be able to land this much faster if we can use this version , else I will have to work with other dependent teams to upgrade this to 1.17

kgpai avatar Aug 14 '24 17:08 kgpai

Would it be possible to use an older version of c-ares (1.13) for this pupose ?

@kgpai I will take a look. Thanks.

rui-mo avatar Aug 15 '24 00:08 rui-mo

@kgpai Updated in https://github.com/facebookincubator/velox/pull/10357/commits/98398971b70a185fc70c289d26740268de38ce52 to use c-ares (1.13). A patch from https://github.com/c-ares/c-ares/commit/fa903fd7bb124226e2832fc071ba241653cfe49c is applied to solve its compilation failure. Would you take another look? Thanks.

rui-mo avatar Aug 15 '24 10:08 rui-mo

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Aug 16 '24 16:08 facebook-github-bot

@kgpai A new commit https://github.com/facebookincubator/velox/pull/10357/commits/d5a7c5d49cb8e607ec020b1eb968cb4d88eac128 has been added to work with d23dde5aa2bec01b7788af8455e49e81f50e6cc7. This PR might need to be re-imported. Thanks for your help.

rui-mo avatar Aug 19 '24 03:08 rui-mo

@kgpai merged this pull request in facebookincubator/velox@82bad79442b80047981fde5e00fd8122f167e43c.

facebook-github-bot avatar Aug 20 '24 19:08 facebook-github-bot

Conbench analyzed the 1 benchmark run on commit 82bad794.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

conbench-facebook[bot] avatar Aug 20 '24 20:08 conbench-facebook[bot]

@rui-mo @kgpai could we add new build options like VELOX_ENABLE_SPARK_RUNNER to control downloading related deps? now, absl, c-ares, grpc, spark will be downloaded when VELOX_ENABLE_TESTING was enabled, but many user does not really want to.

Yohahaha avatar Aug 21 '24 02:08 Yohahaha

@Yohahaha In the next step, we will use the Spark query runner as the reference database for the Spark fuzzer test, so I assume if we need to run the fuzzer, the Spark query runner is needed. But it's fine to me to add an option like 'VELOX_ENABLE_FUZZER' if you think it's necessary @kgpai. Thanks.

rui-mo avatar Aug 21 '24 06:08 rui-mo

Presto fuzzer previously brought in some extra dependencies as well. We should have a CMake option to enable all fuzzers VELOX_ENABLE_FUZZERS.

majetideepak avatar Aug 21 '24 14:08 majetideepak

I opened a PR here for this https://github.com/facebookincubator/velox/pull/10802

majetideepak avatar Aug 21 '24 20:08 majetideepak

I opened a PR here for this #10802

@majetideepak Thanks!

rui-mo avatar Aug 22 '24 01:08 rui-mo