tvm
tvm copied to clipboard
[Bug] Enable USE_DNNL cause Sphix crash when processing from_oneflow.py and from_paddle.py for documents
Expected behavior
What you were expecting Set USE_DNNL ON or OFF should not impact document processing
Actual behavior
What actually happened
Set USE_DNNL ON then run "docker/bash.sh --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7 ./tests/scripts/task_python_docs.sh" will see the crash happen in procesing from_oneflow.py and from_paddle.py.
Environment
Any environment details, such as: Operating System, TVM version, etc tlcpack/ci-gpu:20220630 tlcpack/ci-gpu:20220619
Steps to reproduce
Preferably a minimal script to cause the issue to occur.
mkdir ./build cp ./cmake/config.cmake ./build/ echo set(USE_DNNL ON) >> ./build/config.cmake docker/bash.sh -it --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7 cd build cmake ../ make cd ../ ../tests/scripts/task_python_docs.sh"
Debug information
docker/bash.sh -it --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7
cd _staging
gdb python3
set args -m sphinx -b html -d /workspace/docs/_build/doctrees . /workspace/docs/_build/html
r
can saw the crash happen in 'dlopen' for "from_oneflow.py " after set USE_DNNL OFF , and rebuild issue go away
@driazati
@huajsj just curious why you needed USE_DNNL ON in your tutorial? is that closer to the use case for pipeline executor, or is it possible to demonstrate pipeline executor with just two llvm
graphs?
@areusch, thanks for the follow up, yes BYOC should be the use case for pipeline executor which target to bring different backend/hardware together to do a heterogenous parallel execution and get the performance improvement.
besides of dnnl, cutlass is another option of BYOC backend, I am trying to see if i can bring up a cutlass example, if that still not work,definitely I will go to the two LLVM tutorial.
After using CUTLASS+BYOC in PR 11557, the crash issue gone, now this issue not the blocker of PR11557 anymore.
Hi @huajsj , Do you observe the failure before merging the pr https://github.com/apache/tvm/pull/11638 ? Shall we rule out this one?
@huajsj are you able to look at the question above?
@areusch @billishyahao , thanks for the follow up, I tried before PR https://github.com/apache/tvm/pull/11638 , but still saw the issue.