tvm icon indicating copy to clipboard operation
tvm copied to clipboard

[Bug] Enable USE_DNNL cause Sphix crash when processing from_oneflow.py and from_paddle.py for documents

Open huajsj opened this issue 2 years ago • 7 comments

Expected behavior

What you were expecting Set USE_DNNL ON or OFF should not impact document processing

Actual behavior

What actually happened

Set USE_DNNL ON then run "docker/bash.sh --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7 ./tests/scripts/task_python_docs.sh" will see the crash happen in procesing from_oneflow.py and from_paddle.py.

Environment

Any environment details, such as: Operating System, TVM version, etc tlcpack/ci-gpu:20220630 tlcpack/ci-gpu:20220619

Steps to reproduce

Preferably a minimal script to cause the issue to occur.

mkdir ./build cp ./cmake/config.cmake ./build/ echo set(USE_DNNL ON) >> ./build/config.cmake docker/bash.sh -it --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7 cd build cmake ../ make cd ../ ../tests/scripts/task_python_docs.sh"

Debug information

docker/bash.sh -it --env CI --env TVM_SHARD_INDEX --env TVM_NUM_SHARDS --env RUN_DISPLAY_URL --env PLATFORM tlcpack/ci-gpu:20220630-060117-558ba99c7
cd _staging
gdb python3
set args -m sphinx -b html -d /workspace/docs/_build/doctrees   . /workspace/docs/_build/html
r

can saw the crash happen in 'dlopen' for "from_oneflow.py " after set USE_DNNL OFF , and rebuild issue go away

huajsj avatar Jul 06 '22 17:07 huajsj

@driazati

huajsj avatar Jul 06 '22 17:07 huajsj

@huajsj just curious why you needed USE_DNNL ON in your tutorial? is that closer to the use case for pipeline executor, or is it possible to demonstrate pipeline executor with just two llvm graphs?

areusch avatar Jul 11 '22 21:07 areusch

@areusch, thanks for the follow up, yes BYOC should be the use case for pipeline executor which target to bring different backend/hardware together to do a heterogenous parallel execution and get the performance improvement.

besides of dnnl, cutlass is another option of BYOC backend, I am trying to see if i can bring up a cutlass example, if that still not work,definitely I will go to the two LLVM tutorial.

huajsj avatar Jul 13 '22 22:07 huajsj

After using CUTLASS+BYOC in PR 11557, the crash issue gone, now this issue not the blocker of PR11557 anymore.

huajsj avatar Jul 18 '22 03:07 huajsj

Hi @huajsj , Do you observe the failure before merging the pr https://github.com/apache/tvm/pull/11638 ? Shall we rule out this one?

billishyahao avatar Jul 19 '22 06:07 billishyahao

@huajsj are you able to look at the question above?

areusch avatar Jul 25 '22 04:07 areusch

@areusch @billishyahao , thanks for the follow up, I tried before PR https://github.com/apache/tvm/pull/11638 , but still saw the issue.

huajsj avatar Jul 28 '22 01:07 huajsj