tvm
tvm copied to clipboard
[CI Image] support CSI-NN2 in ci_qemu
-
[x] S0. Reason: Introduce CSI-NN2 Compute Library into TVM to accelerate the inference performance of RISC-V CPU with Vector Extension
-
[x] S1. Tag of nightly build: TAG: 20220713-060117-ca88c522f
-
[x] S2. The nightly is built on TVM commit: https://github.com/apache/tvm/pull/11689 https://github.com/apache/tvm/pull/11905 https://github.com/apache/tvm/pull/12230
-
[ ] S3. Testing the nightly image on ci-docker-staging:
-
[ ] S4. Retag TAG to VERSION:
docker pull tlcpackstaging/ci_qemu:20220720-060122-eb7cf7051
docker tag tlcpackstaging/ci_qemu:20220720-060122-eb7cf7051 tlcpack/ci_qemu:20220720-060122-eb7cf7051
docker push tlcpack/ci_qemu:20220720-060122-eb7cf7051
-
[ ] S5. Check if the new tag is really there: https://hub.docker.com/u/tlcpack
-
[ ] S6. Submit a PR updating the IMAGE_NAME version on Jenkins
hi @areusch, the nightly rebuild has passed. So next we need to push the image to tlcpack, right?
The docker build passed, but it looks like the validate failed, something between commits cf15375e2 and ca88c522f is causing this error on the QEMU build (example):
[2022-07-15T18:49:46.656Z] INFO:__main__:b'[100%] [QEMU] CPU: riscv32\n'
[2022-07-15T18:49:46.656Z] qemu-system-riscv32: error while loading shared libraries: libsnappy.so.1: cannot open shared object file: No such file or directory
[2022-07-15T18:49:46.656Z] make[3]: *** [zephyr/CMakeFiles/run] Error 127
[2022-07-15T18:49:46.656Z] INFO:__main__:b"zephyr/CMakeFiles/run.dir/build.make:70: recipe for target 'zephyr/CMakeFiles/run' failed\n"
[2022-07-15T18:49:46.656Z] make[2]: *** [zephyr/CMakeFiles/run.dir/all] Error 2
[2022-07-15T18:49:46.656Z] INFO:__main__:b"CMakeFiles/Makefile2:2689: recipe for target 'zephyr/CMakeFiles/run.dir/all' failed\n"
[2022-07-15T18:49:46.656Z] make[1]: *** [zephyr/CMakeFiles/run.dir/rule] Error 2
[2022-07-15T18:49:46.656Z] INFO:__main__:b"CMakeFiles/Makefile2:2696: recipe for target 'zephyr/CMakeFiles/run.dir/rule' failed\n"
[2022-07-15T18:49:46.656Z] make: *** [run] Error 2
[2022-07-15T18:49:46.656Z] INFO:__main__:b"Makefile:527: recipe for target 'run' failed\n"
[2022-07-15T20:35:49.431Z] Sending interrupt signal to process
[2022-07-15T20:35:49.756Z] Sending interrupt signal to process
[2022-07-15T20:35:56.719Z] script returned exit code 143
and for some reason the job doesn't exit but sits there until it hits the 3 hr timeout. #11689 is the only change to the docker/
folder in that range but it could be other code changes or even unpinned dependencies updating
thanks @driazati, It seems to be the problem of environment variables. let me solve it.
hi @driazati, It is the conflict between Xuantie QEMU and Zephyr QEMU. I have solved it by modifying the script in CSI-NN2 (merged), and it will not involve the content in TVM code. It seems that we need to wait for nightly rebuild before continuing.
hi @driazati , I use image tlcpackstaging/ci_qemu:20220720-060122-eb7cf7051
to run the test locally. I don't meet the error mentioned above. but I meet a little other problem. I'm not sure if this is really a problem or my configuration is wrong. can we use Ci to check the validity again?
@alter-xp we run these nightly, here's the latest failure. resolving the ARM toolchain seems maybe-related to this...what if instead of trying to fix the two together, we just create ci_riscv now? if you create Dockerfile.ci_riscv, i can help with getting that one built and added to various scripts.
thanks @areusch, i will create Dockerfile.ci_riscv. and should we need to create a RFC for this?
i think we're ok w/o RFC for this one as it's just moving around previously-RFC'd functionality. thanks!
hi @areusch, I created a Dockerfile.ci_riscv (pr), which only retains the basic environment and riscv related content in Dockerfile.ci_qemu. Do we need to use CI to build ci_riscv image next? I'm not familiar with this. So can you help me list the next process and what I need to do? thanks!
@alter-xp sorry--i've been a bit busy lately. looks like @driazati approved/merged that one, and i created https://github.com/tlc-pack/tlcpack/pull/131 to add that build to our nightly docker image rebuild/test. looks like that pipeline is green now so we should be able to get that image built today or tomorrow and then we can begin using it. i'll update this thread when it's ready.