tvm
tvm copied to clipboard
CI monitoring rotation schedule
See the CI Monitoring Runbook for context.
This schedule tracks the CI monitoring rotation. If you would like to join, please comment on this issue or, if you are a committer, edit this issue directly.
| Week | On-call |
|---|---|
| 8/1 - 8/5 | @driazati |
| 8/8 - 8/12 | @gigiblender |
| 8/15 - 8/19 | @shingjan |
| 8/22 - 8/26 | --- |
| 8/29 - 9/2 | --- |
| 9/5 - 9/9 | --- |
| 9/12 - 9/16 | --- |
This isn't part of the regular rotation but I thought I'd post a summary of my week so everyone has some more visibility into the process. Last week was lots of infra problems which hopefully are fixed now that we've increased capacity limits and fixed our cleanup logic.
- jenkins on
c253053- reverted in #11496 - jenkins on
01ee1bc- failed due to a Jenkins reboot - jenkins on
6f3c8bd- failed due to a Jenkins reboot - jenkins on
7766ab2- failed due to a Jenkins reboot - jenkins on
2a2d910- failed due to a Jenkins reboot - jenkins on
6895087- network problems, filed #11492 - jenkins on
cfcca59- node ran out of disk, fixed in https://github.com/tlc-pack/ci-terraform/pull/41 and #11491 - jenkins on
2f21698- node ran out of disk, fixed in https://github.com/tlc-pack/ci-terraform/pull/41 and #11491 - jenkins on
b535e46- node ran out of disk, fixed in https://github.com/tlc-pack/ci-terraform/pull/41 and #11491 - jenkins on
52df2e8- node ran out of disk, fixed in https://github.com/tlc-pack/ci-terraform/pull/41 and #11491 - jenkins on
a9ece3d- node ran out of disk, fixed in https://github.com/tlc-pack/ci-terraform/pull/41 and #11491 - jenkins on
8135860- timed out due to queuing - jenkins on
d519b03- timed out due to queuing - jenkins on
92cc5b0- timed out due to queuing - jenkins on
bbdb656- network problems and flaky test, filed #11459 and #11458 - jenkins on
6c6dfbc- fixed by #11457 - jenkins on
c247295- fixed by #11456 - jenkins on
014208e- fixed by #11456 - jenkins on
3f53e7a- fixed by #11456 - jenkins on
7e83c4a- fixed by #11456 - jenkins on
7ba8a61- failed due to a - jenkins on
b141cac- filed #11440 - gha on
7bab8f7- filed #11416
Summary from my week: things were fairly quiet, mainly some flaky tests.
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3414/pipeline) on c1b22ee - flaky test, filed https://github.com/apache/tvm/issues/11527
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3415/pipeline) on ac5d781 - continued flaky paddlepaddle test, updated https://github.com/apache/tvm/issues/9976. better diagnostic submitted today.
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3411/pipeline) on bc14f26 - network flake, filed https://github.com/apache/tvm/issues/11514
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3444/tests) on 2ae2088 - assert_allclose 2/10000 mismatched (0.02%), filed https://github.com/apache/tvm/issues/11568
@areusch @driazati I can take next week
Summary from 6/13/22 - 6/19/22: mostly infra and flaky test issues
- jenkins on https://github.com/apache/tvm/commit/2df4524e04cf48f759175a746632efe6ff0a7ea6 https://github.com/apache/tvm/issues/11710
- jenkins on https://github.com/apache/tvm/commit/81cc0864004bb64c8c70ce0ed1abbc3a8755458c https://github.com/apache/tvm/issues/11713
- gha on https://github.com/apache/tvm/commit/a82d2f516e0f484ad3d91fa2dd9997cfc016893f https://github.com/apache/tvm/issues/11738
- jenkins on https://github.com/dos-lab/as-tvm/commit/583696fceb8abd4c3c2f4abb93d0dbce2e79f60f https://github.com/apache/tvm/issues/11749
- jenkins https://github.com/apache/tvm/issues/11763
- jenkins on https://github.com/apache/tvm/commit/b4a77ac7f4ed9c639a28468133b22d9b03c69bf7 https://github.com/apache/tvm/issues/11764
- jenkins on https://github.com/apache/tvm/commit/65d45af54b7a90f759fe8effb4abe71209b8e08e https://github.com/apache/tvm/issues/11804
- gha on https://github.com/apache/tvm/commit/0fdc0eab5199d1b6549d2b2f94c83d86d5545e81 https://github.com/apache/tvm/issues/11805
Happy to sign up for 6/27-7/1 if someone could edit the issue for me :)
Week of 6/20 - 6/27:
- Details on bc75487 #11830
- Details on b63801c #11811
- Details on a363a04 #11811
- Details on 7b0f791 #11811
- Details on a5366a7 #11811
- Details on 32d16eb #11917
- Details on 98fb955 #11840
- Details on 6ed3ab3 #11840
- Details on 1e0e954 (Fixed not sure what the issue was)
- Details on 51b0d8c #11580
- Details on 410e836 #11580
- Details on 12e8744 #11918
- Details on 5e81067 #11580
- Details on 4cb18b4 #11919
- Details on aa66e9f #11918
- Details on 600a201 #11840
- Details on 98bf40f #11920
I can help on 7/11-7/15 :)
27/06/2022 jenkins on 1115fd9bc261619ffa0539746ae0aebc46232dc6 aborted due to CI disk space issue
28/06/2022 jenkins on 6c8a3530998d46182fe0887a24739d21961a056d https://github.com/apache/tvm/issues/11580 jenkins on 688b0825e2ea9ffedafaa83d4027701a4a8a67d1 https://github.com/apache/tvm/issues/11580
29/06/2022 jenkins on a17bfc05cc127dd8d3922d7b79b7ff4754893d49 https://github.com/apache/tvm/issues/11964
30/06/2022 jenkins on 41c94b27ef5f10ad70af211dd25c4837dad53f64 https://github.com/apache/tvm/issues/11967 jenkins on 898946fec60898b8fa753d6f0cdf8ebc86c9827a https://github.com/apache/tvm/issues/11567 jenkins on 558ba99c7cad6fa5f01cfdb2bd6bdd2cec6087db https://github.com/apache/tvm/issues/12004 and flaky due to network jenkins on e7851ed763cd9e7e64c1e298908297d3f4ba93c7 https://github.com/apache/tvm/issues/11967 jenkins on 80a0c6c53dc7e3aca2bc52755fabbad76cbac35a https://github.com/apache/tvm/issues/11967 jenkins on 915c23b61b34604b19217759f320c84d3aa60605 https://github.com/apache/tvm/issues/11967 jenkins on c0f4bf72b6ee30648ef78ce865afc733c95fe98c https://github.com/apache/tvm/issues/11967
01/07/2022 jenkins on ec39199edb72dfe93747249d6a060c1832a8e38f https://github.com/apache/tvm/issues/11580 jenkins on 395e91ff54543864a90240d18c8efd8c277c758b https://github.com/apache/tvm/issues/11580 jenkins on 9e14509cabf9e6ba674d819d36b5d29f97f3dc2f https://github.com/apache/tvm/issues/11568 jenkins on 50cd4d635cb0947e90d5d8ecdd94baeabf57ab31 https://github.com/apache/tvm/issues/11568
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3753/pipeline/) on cfe8318 - failed to docker pull from DockerHub
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3755/pipeline) on 9f4bf38 - failed to docker pull from DockerHub
- jenkins on 99d42b2 - test_qlinearadd tolerance, filed https://github.com/apache/tvm/issues/12062
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3767/pipeline) on f769f4e - Skipped has no len, added to https://github.com/apache/tvm/issues/11749
- [jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/detail/main/3769/pipeline) on a81e69a - http flake
- jenkins on https://github.com/apache/tvm/commit/ef5c3ed872c33a1587dd41c6c97dd85350df7269 - failure on git push to https://github.com/apache/tvm-site.git
- jenkins on https://github.com/apache/tvm/commit/3992d2443acba6a824ec4da58bfc30f9e0e5d5b5 - numerical error at
test_myloat()intest_custom_datatypes.py, filed https://github.com/apache/tvm/issues/12092 - jenkins on https://github.com/apache/tvm/commit/b9fa576ab35ef67e0a7fbaf669109f010e66e20c - Error response from daemon: Get "https://registry-1.docker.io/v2/tlcpack/ci-i386/manifests/sha256:096c51de2a27fd3e497c17433937a4cd4357fe9623f06f6b2a47e094cf515b14": EOF
Summary for 7/25/22 - 7/29/22:
jenkins on https://github.com/apache/tvm/commit/eada707a7027bec47cf003916d349893a9249e4d - failure on android-rpc build jenkins on https://github.com/apache/tvm/commit/88bbb405408f9d6ae13da7a35341ed83ce8bc9f1 - failure on git push to https://github.com/apache/tvm-site.git jenkins on https://github.com/apache/tvm/commit/03aed787df32cee8b4691246e4f9e41f72ebd051 - failure on ios-rpc jenkins on https://github.com/apache/tvm/commit/195e60b97a315f164b383f5b2d5b608f619ae036 - failure on duplicate global packedfunc jenkins on https://github.com/apache/tvm/commit/9a4d80c5fe7d756baa83708fd9679c7ae0fac195 - filed [Flaky Test] tests/python/unittest/test_custom_datatypes.py::test_myfloat #12238 jenkins on https://github.com/apache/tvm/commit/aeda760e5e29eddd0a7ddb22c7031f9607440770 - filed #12238 jenkins on https://github.com/apache/tvm/commit/578ef035b201d7fa9756d51ac2e89e4a90663990 - filed #12238
Summary for 2022-07-30 to 2022-08-06
| Run | Commit | Mitigation |
|---|---|---|
| tvm-ci/branch | 1f97f1fbd55ff223dc73dbe62ae1c9b7aa16337a from #11809 | Timeout fixed by #12334 |
| tvm-ci/branch | 485bfaf1ea9d7afaf7fa6cb3e03415364ffc4412 from #12306 | Fixed by #12325 |
| tvm-ci/branch | 4158738574b7187eb6c9d4f8c473e3c707e268a2 from #12301 | Flaky test, filed #12311 and opened #12312 to improve reporting |
| tvm-ci/branch | 2bfd52f8855eccb0515bb517cfdbc242029b4001 from #12278 | Fixed by #12282 |
| tvm-ci/branch | 12502cc835157ed93e2ccba855a2b3cdfd6d1331 from #12251 | Fixed by #12268 |
| tvm-ci/branch | a231a1d724283545c16221fc1316dd77cfa840b6 from #12245 | Added retries to fix: #12306 |
| tvm-ci/branch | fb87c21bf8d0fa5edec96a054a57a6d37c11289f from #12234 | Fixed by #12306 |
| tvm-ci/branch | dff5c975a082e6f15b556914a029541b63ff1280 from #11037 | Fixed by #12306 |
| tvm-ci/branch | db4380cf41757e19c56c62ae3bf3a441c44de521 from #12230 | Fixed by #12268 |
Generated by https://gist.github.com/driazati/80cd48e86c6548cd90a6b39be010b921
Summary for 2022-08-07 - 2022-08-14:
jenkins on fc411dc6fad14909ed17ce8c39d621d4587441bc - Failure due to timeout in Cortex-M shards. Fixed in #12334.
jenkins on 9b860095532d2a01525a449fac2bdfc0813bf4cc - Failure due to timeout in Cortex-M shards. Fixed in #12334.
jenkins on 8e133b1990eefda82c86b4f5e30191d720effcc4 - Sent interrupt signal to the CI.
jenkins on 7f800e41de097c333ff5790feb1a8c42575b0fe1 - Fixed by #12341.
CI MacOS on 52d6b59a39f503fe382b4d7cbac4b02f9e44aae0 - Filed #12449.
jenkins on 22102063dccf42c29f9d43ee5684026ba67a3386 - Failure due to linting the Jenkinsfile. Fixed by #12360.
jenkins on 5d72bc1a20461925e3e4f7f47907ae9173bb183b - Failure due to linting the Jenkinsfile. Fixed by #12360.
jenkins on 7f100158a551ae94db5d401f71479ba341adc7d5 - Failure due to linting the Jenkinsfile. Fixed by #12360.
jenkins on 52152e0be641c80c0dbb1e36ab3654efbd27661f - Failure due to linting the Jenkinsfile. Fixed by #12387.
jenkins on 48354ded387b553b994caedade63ebb08b3ebd30 - Failure due to linting the Jenkinsfile. Fixed by #12387.
jenkins on 5deb95a9472002c9fa36a150f9e348f4276d63c5 - Deploy docs failed to git push in the CI.
jenkins on c3c7c4ccc3e3ebb7cc4dbb55e8dde579ac52c949 - Failed due to flaky test. #12451.
jenkins on 3eb673478bc444daf24ee8d6308a42a71c81b74f - Failed due to flaky test. #12451.
jenkins on 1737308397e0f105c387d9ee6c466584aded1d7d - Linting failed with exit code 4.
Summary for 2022-08-15 to 2022-08-22
| Run | Commit | Mitigation |
|---|---|---|
| tvm-ci/branch | d805ae3bd99a71255dc7f7d1d5aa0746ab2ed21e from #12425 | Internet errors - flaky test reported in #12465 |
| tvm-ci/branch | 1ba17fe48b254a287458834041bd4aaf7d9f49b5 from #12401 | Timeout errors - flaky test reported in #12464 |
| tvm-ci/branch | bd562313250ce6a1bcc90eb7e94f64bc10563104 from #12443 | Timeout errors - flaky test reported in #12464 |
| tvm-ci/branch | #12478 | flaky test reported in #12511 |
| tvm-ci/branch | #12441 | flaky test reported in #12511 |
| tvm-ci/branch | #12483 | flaky test reported in #12511 |
| tvm-ci/branch | #12513 | flaky test reported in #12511 |
| tvm-ci/branch | #12532 | flaky test reported in #12511 |
| tvm-ci/branch | #12508 | flaky test reported in #12511 |
| tvm-ci/branch | #12551 | flaky test reported in #12511 |
| tvm-ci/branch | #12539 | doc build failed |
Generated by https://gist.github.com/driazati/80cd48e86c6548cd90a6b39be010b921
Summary for 2022-08-22 to 2022-08-29
| Run | Commit | Mitigation |
|---|---|---|
| tvm-ci/branch | 1afd0593956066635ee49297b731726c9218c91c from #12340 | Checks API failure (https://github.com/apache/tvm/issues/12602) |
| tvm-ci/branch | 90b2f0d36996be10d71f0c923f588c6dfa0e8546 from #12557 | maven problem (https://github.com/apache/tvm/issues/12601) |
| CI / Android | 8174d082e8168db9ad63826c9d68aee8c76c7090 | android_rpc build failure (https://github.com/apache/tvm/issues/12599) |
| tvm-ci/branch | 13ebbfb37f8cec1da71d88fbcbecdd4ad4d24dcc from #12562 | Deploy docs failed to git push in the CI (https://github.com/apache/tvm/issues/12600) |
| tvm-ci/branch | 52779f1273b05d53d8213e23e70d9b0ac82fd0b9 from #12353 | ethos-u failures (https://github.com/apache/tvm/issues/12511) |
| tvm-ci/branch | 3983a472c6f3ad4ad9604ceeffdf80cce01d166b from #12543 | ethos-u failures (https://github.com/apache/tvm/issues/12511) |
| tvm-ci/branch | d26bf809e4c3c8d6576d4e436475997eb12deb3e from #12541 | ethos-u failures (https://github.com/apache/tvm/issues/12511) |
| tvm-ci/branch | 534412896e6d39ee4f830d63370d02e8e5f09050 from PR https://github.com/apache/tvm/pull/12623 | internal pytest failure during ethos-u testing (https://github.com/apache/tvm/issues/12634) |
Generated by https://gist.github.com/driazati/80cd48e86c6548cd90a6b39be010b921
Summary for 8/29 - 9/2
jenkins on https://github.com/apache/tvm/commit/0de22196db5f818a6937f026db43785935b9e731 - Unable to find image 'tlcpack/ci-lint:20220810-060142-fae79bbc3' locally
jenkins on https://github.com/apache/tvm/commit/0de22196db5f818a6937f026db43785935b9e731 - Segmentation fault (core dumped)
jenkins on https://github.com/apache/tvm/commit/74988d36bd578b791bbdcea383d343d62029e9cf - Failed unit tests
jenkins on https://github.com/apache/tvm/commit/58ee935a53893bfd47b9cd7ea4738ecec8d7181e - Failed unit tests
jenkins on https://github.com/apache/tvm/commit/a399e6ce9759cd524fcb8f804749baa426096e4b - Failed unit tests
jenkins on https://github.com/apache/tvm/commit/aa6c7123d0a2cdd93256c6a4576ff029008fd375- segfault in tests/scripts/setup-pytest-env.sh
jenkins on https://github.com/apache/tvm/commit/50dad0d9a3c85f7692025b5330ceb902e264bb92 - failed to push due to merge conflcit
jenkins on https://github.com/apache/tvm/commit/eecb7fd494052ca941f3d123daa2e887f14b7e75 - segfault in tests/scripts/setup-pytest-env.sh
jenkins on https://github.com/apache/tvm/commit/b2d660006446f720f0c9488f96d28387cbd0d294 - ERROR tests/python/frontend/darknet/test_forward.py - urllib.error.HTTPError
jenkins on https://github.com/apache/tvm/commit/bb56f2a972606b33e5479d1e18d4c4f13751eeed - http error 502 on ``tests/python/frontend/tflite/test_forward.py`
jenkins on https://github.com/apache/tvm/commit/0549a08f4de40a5a0db277cfe1ae00ab22fc9107 - urllib.error.HTTPError: HTTP Error 503: Service Unavailable