vision
vision copied to clipboard
[DONOTMERGE] add CI tests for torch.compile'ing the transforms.v2 kernels
I don't intend to merge this. Rather, this should serve as base to get a feeling how far away we are from achieving our goal in #8056.
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/8127
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 8 New Failures, 1 Unrelated Failure
As of commit 90ab254f1a9e833361b480f57e4d104de90e59cd with merge base 6e18cea3485066b7277785415bf2e0422dbdb9da ():
NEW FAILURES - The following jobs have failed:
- Tests / compile (false, eager, false) / dynamic=false,backend=eager,fullgraph=false (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (false, eager, true) / dynamic=false,backend=eager,fullgraph=true (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (false, inductor, false) / dynamic=false,backend=inductor,fullgraph=false (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (false, inductor, true) / dynamic=false,backend=inductor,fullgraph=true (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (true, eager, false) / dynamic=true,backend=eager,fullgraph=false (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (true, eager, true) / dynamic=true,backend=eager,fullgraph=true (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (true, inductor, false) / dynamic=true,backend=inductor,fullgraph=false (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video - Tests / compile (true, inductor, true) / dynamic=true,backend=inductor,fullgraph=true (gh)
test/test_transforms_v2.py::TestJPEG::test_kernel_video
BROKEN TRUNK - The following job failed but were present on the merge base:
👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes.
With torch-2.2.0.dev20231121+cpu and torchvision @ 893b4abdc0c9df36c241c58769810f69e35dab48:
backend='eager', dynamic=True: 259 failedbackend='inductor', dynamic=True: 307 failedbackend='eager', dynamic=False: 98 failedbackend='inductor', dynamic=False: 110 failed
Some relevant issues on core that will resolve a lot of the issues:
- pytorch/pytorch#114218
- ~pytorch/pytorch#114220~ avoided by #8171
- pytorch/pytorch#114231
- pytorch/pytorch#114310
- pytorch/pytorch#114353
- pytorch/pytorch#114354
- pytorch/pytorch#114464
- pytorch/pytorch#114483
- pytorch/pytorch#114866
Failures on pad_mask with fullgraph are a test error and fixed in #8132.
In c5c72abeef04db4852e7ffb764c1c872aacbd25c and 29ea48a71df246bbcc9a4b4dff44baeaa45513fe, I've added atol=1, rtol=0 for uint8 / bilinear resize. With this, the most lenient setting, i.e. static shapes, eager backend, and graphbreaks being allowed, the tests are now passing: https://github.com/pytorch/vision/actions/runs/6971806789/job/18972632003?pr=8127 :tada:
With torch-2.3.0.dev20231218+cpu and torchvision @ 6c2e0ae88b056ba2ac897d4a7c1b7153cefcb444:
| dynamic | backend | fullgraph | failing tests |
|---|---|---|---|
| False | eager | False | 0 |
| False | eager | True | 63 |
| False | inductor | False | 8 |
| False | inductor | True | 71 |
| True | eager | False | 72 |
| True | eager | True | 192 |
| True | inductor | False | 86 |
| True | inductor | True | 206 |
I've factored out e6a54bf78732a9c27f2dfb56364219d726a7a4bd into #8171.
With torch-2.3.0.dev20231222+cpu and torchvision @ 26fb5efe9d046ed9bf059b393f5785b9bdd9ec7e:
| dynamic | backend | fullgraph | failing tests | diff to previous |
|---|---|---|---|---|
| False | eager | False | 0 | 0 |
| False | eager | True | 23 | -40 |
| False | inductor | False | 8 | 0 |
| False | inductor | True | 31 | -40 |
| True | eager | False | 72 | 0 |
| True | eager | True | 95 | -97 |
| True | inductor | False | 80 | -6 |
| True | inductor | True | 103 | -103 |
Some great progress with torch-2.3.0.dev20240117+cpu and torchvision @ 1de7a74a8b93483f1703eef0b306e0ec68e0cd9d where 72 failing tests were resolved from all of the dynamic=True jobs.
With torch-2.4.0.dev20240401+cpu and torchvision https://github.com/pytorch/vision/commit/5181a854d8b127cf465cd22a67c1b5aaf6ccae05 we have the following failures:
- test.test_transforms_v2.TestPerspective: image/bbox - due to
torch._dynamo.exc.Unsupported: dynamic shape operator: aten.linalg_lstsq.default- can't be fixed now - test.test_transforms_v2.TestSanitizeBoundingBoxes.test_kernel - due to
torch._dynamo.exc.Unsupported: dynamic shape operator: aten.nonzero.default- probably, can't be fixed neither - test.test_transforms_v2.TestJPEG: image/video - due to
torch._dynamo.exc.TorchRuntimeError: Failed running call_function image.encode_jpeg(*(FakeTensor(..., size=(3, 17, 11), dtype=torch.uint8), 5), **{}):