test-infra
test-infra copied to clipboard
[Dr CI] Wrong classification for XLA
Pretty sure the xla failure on this was real
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/124920
- :page_facing_up: Preview Python docs built from this PR
- :page_facing_up: Preview C++ docs built from this PR
- :question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: You can merge normally! (2 Unrelated Failures)
As of commit a6516ea6789e12a1a80a8c8cc7ce63698d443821 with merge base 59a1f1f308545e3ac1d81940a51f8dc0db3d82d4 ():
FLAKY - The following job failed but was likely due to flakiness present on trunk:
- pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh)
test_all_cpu_tensor
BROKEN TRUNK - The following job failed but was present on the merge base:
👉 Rebase onto the `viable/strict` branch to avoid these failures
- pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_cudagraph_trees.py::CudaGraphTreeTests::test_mutation_cudagraph_managed_tensors_config_backend_cudagraphs
This comment was automatically generated by Dr. CI and updates every 15 minutes.
I think it matched against https://github.com/pytorch/pytorch/actions/runs/8819380894/job/24214658473 which was recent and has a different error trace but the same test name. However, it doesn't show up on the main branch afaict. Are flaky failures checking all branches? Can it be changed to only be against main?
cc @huydhn
I'm trying to figure out a solution for this case. From what I see, XLA error matching is sometime not good because I have not been paying to much attention on what is running on XLA size to build up a good log classifier support. One common mismatch is ModuleNotFoundError: No module named 'torch.version' which appears on all XLA test job.
This has been fixed by https://github.com/pytorch/test-infra/pull/5151, so I will close this.