ray
ray copied to clipboard
[release][no_ci] Make sure that test code matches the installed wheel.
Signed-off-by: xwjiang2010 [email protected]
Why are these changes needed?
When a daily release test is launched, the following may happen:
- Latest wheel on master is wheel_n
- a PR (that modifies both the test code and the source code) is landed.
- a wheel is being built (let's call it wheel_n+1)
- Now nightly release test starts to run.
- in run_release_test.sh, we fetch most recent wheel and install it. This is wheel_n, since wheel_n+1 is still being built.
- in run_release_tst.sh, we check out test code, which is current master (includes PR_n+1).
- Now there is a discrepancy between source code (wheel version) and test code.
This PR tries to address this inconsistency issue.
Related issue number
Closes #30055
Checks
- [ ] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [ ] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
I'm not sure this is the best way to go about this. Instead we should probably grab the HEAD commit in
build_pipeline.py
(we do a git clone there) and propagate it to the script viaRAY_TEST_COMMIT
. Then checkout that commit if the variable is set.
I am not sure I understand the comment around build_pipeline.py
, see above comment.
It would be good to test this if possible. There are existing unit tests for the step building process that can be used as a starting point.
Yes I will add tests after we settle on the implementations.
Also, can you please add a bit of context to the PR (what is the problem, when does it come up, what do we do to solve it). This is just for future reference.
Sure, updated the description to have more details. PTAL
Thanks!
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
- If you'd like to keep this open, just leave any comment, and the stale label will be removed.
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our discussion forum or Ray's public slack channel.
Thanks again for opening the issue!
Actually I have seen this issue during my last oncall again twice. I will try to revive this PR and land it
@krfricke sorry for the delay on this one. I saw two occasions of this my recent oncall. So let's see how to land this. Happy to address new comments.
also cc @can-anyscale
Also no idea how to add proper test... open to suggestions folks :)
Oh wow, this has been from last year, nice work though!
Might be a dummy idea, but have we thought about just unzip the wheel to get the test code instead of git clone?
Might be a dummy idea, but have we thought about just unzip the wheel to get the test code instead of git clone?
Not sure if release code is part of our wheel tho, it shouldn't right?
Ah, you're absolutely right, it does not
@krfricke do you know how to run a manual test for this? The problem with "buildkite/release-tests-pr" is that it will be on my own repo and branch and hence the logic won't kick in :(
Does it work if you create a new build from here https://buildkite.com/ray-project/release-tests-pr/builds?branch=xwjiang2010%3Afix_release_test_racing, then unfold the Option to specify the environment variables
Does it work if you create a new build from here https://buildkite.com/ray-project/release-tests-pr/builds?branch=xwjiang2010%3Afix_release_test_racing, then unfold the Option to specify the environment variables
It wont work right? This is the case that my test_repo and test_branch are not ray master. So the gating logic will kick in and not resetting hard the test commit.
Can you override the test_repo and test_branch by passing the environment variables when creating a buildkite build?
release-tests-pr shows
but doesn't really enter the added logic blob.
Well, let's test in production then ;)
checked bash portion manually with
if [ "${RAY_TEST_REPO}" == "https://github.com/ray-project/ray.git" ] && \
[[ "${PARSED_RAY_WHEELS}" == *"master"* ]] && \
[ "${RAY_TEST_BRANCH-}" == "master" ] && [ -n "${RAY_COMMIT_OF_WHEEL-}" ] && \
[ "${HEAD_COMMIT}" != "${RAY_COMMIT_OF_WHEEL}" ]; then
echo "The checked out test code doesn't match with the installed wheel. \
This is likely due to a racing condition when a PR is landed between \
a wheel is installed and test code is checked out."
echo "Hard resetting from ${HEAD_COMMIT} to ${RAY_COMMIT_OF_WHEEL}."
echo "git reset --hard ${RAY_COMMIT_OF_WHEEL}"
fi
and with vars
export RAY_TEST_REPO=https://github.com/ray-project/ray.git
export PARSED_RAY_WHEELS=https://s3-us-west-2.amazonaws.com/ray-wheels/master/0e0c15065507f01e8bfe78e49b0d0de063f81164/ray-3.0.0.dev0-cp37-cp37m-manylinux2014_x86_64.whl
export RAY_TEST_BRANCH=master
export RAY_COMMIT_OF_WHEEL=ef5bff7fd8b00bfdc6e5d2b20ae74f299339dfbe
export HEAD_COMMIT=ec2c80dde9104d1fc355c4a0f3a2f0982cdf1056
the right content is printed. I will monitor the production when I land it.
Lint and documentation error are not related. merging now.
Testing in production
https://buildkite.com/ray-project/release-tests-branch/builds/1583
Verified that it works in production.