velox icon indicating copy to clipboard operation
velox copied to clipboard

Add nightly build on GPU-enabled workers

Open luhenry opened this issue 1 year ago • 11 comments

It would be too constraining to build and run on GPU-enabled workers for all pull request and branch pushes. We still want to build the GPU code on all pull requests to reduce the chances of regressions.

luhenry avatar Apr 18 '24 11:04 luhenry

Deploy Preview for meta-velox canceled.

Name Link
Latest commit 60f44370ddc352bd5c68ced0c1c47aa76ffbfad6
Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/662218be54bf3f00087ab0a7

netlify[bot] avatar Apr 18 '24 11:04 netlify[bot]

@assignUser @pedroerp @Yuhta could I please get your review. I've triggered a build of Linux Nightly Build at https://github.com/facebookincubator/velox/actions/workflows/linux-nightly-build.yml to make sure it works as expected on the 4-core-ubuntu-gpu-t4 runners. Thanks

luhenry avatar Apr 18 '24 11:04 luhenry

Can we have something smarter that if a PR touches certain directories in the codebase then we still trigger a GPU test run on the PR?

Yuhta avatar Apr 18 '24 15:04 Yuhta

That would also make it easy to only run the wave tests and not the unrelated, non-gpu enabled tests. You can check out the fuzzer workflow on how to do that.

The plan will be to make more and more of the codebase GPU-enabled, so the line will get very blurry and will not be limited to velox/experimental/wave only.

Could we, to further optimize cost, build on ubuntu-latest (no need for a larger instance imo as no one is waiting for feedback in their PR) with gpu=on and move the test binaries to the gpu runner via artifact upload?

The building part takes 25 min, once a day (nightly). Should we try to optimize cost much further? I can definitely take a look but the complexity will not be trivial as the artifacts would be multiple GBs of data as we need the whole _build/release folder for CTest to successfully run tests from it.

Also, the tests successfully ran on GPU at https://github.com/facebookincubator/velox/actions/runs/8740218005/job/23983395524 :tada:

luhenry avatar Apr 19 '24 07:04 luhenry

@luhenry GPU related code should still be concentrated in velox/experimental/wave, I don't see it going to spread everywhere. As @assignUser pointed out though, common code change could also break GPU code, but that is relatively rare and I think we can live with a filter on velox/experimental/{gpu,wave}.

Yuhta avatar Apr 19 '24 15:04 Yuhta

@luhenry Ah I thought it would be just a few test in wave/. Could you then add a label to gpu enabled tests and run only those? See https://cmake.org/cmake/help/latest/command/set_tests_properties.html with the LABELS property.

@Yuhta we added the wave build to the adapters build that is run on PR so it's only about potentially also running the tests on PRs. I am not familiar enough with the subject matter to make a call on how necessary that is and will defer to you and @pedroerp for a call on that.

assignUser avatar Apr 19 '24 22:04 assignUser

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

stale[bot] avatar Jul 19 '24 04:07 stale[bot]

@luhenry are you guys still planning to help us with this?

pedroerp avatar Jul 19 '24 16:07 pedroerp

@pedroerp absolutely! Let me rebase and fix conflicts, and we can resume the work

luhenry avatar Jul 22 '24 11:07 luhenry

Great! Let me know if I can help

pedroerp avatar Jul 24 '24 15:07 pedroerp

This pull request has been automatically marked as stale because it has not had recent activity. If you'd still like this PR merged, please comment on the PR, make sure you've addressed reviewer comments, and rebase on the latest main. Thank you for your contributions!

stale[bot] avatar Oct 24 '24 20:10 stale[bot]