Can not generate .json event trace
Running command shown as below:
LD_PRELOAD=/home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/libunitrace_tool.so /home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/unitrace --chrome-sycl-logging --chrome-dnn-logging --chrome-call-logging --chrome-kernel-logging --chrome-device-logging python test.py
And here comes the segment fault:
The generated json files contain nothing.
When running command as:
LD_PRELOAD=/home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/libunitrace_tool.so /home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/unitrace -d -s -t --chrome-kernel-logging --chrome-device-logging --chrome-no-thread-on-device --chrome-no-engine-on-device python test.py
Here comes the aborted error:
The generated json files contain some logging records.
Hello @yitian1031, Thanks for reporting the issue. I have few questions/suggestions to handle the issue better.
- Are you able to run the test.py without unitrace? As per the call stack shared, it looks like application (test.py) error due to bad allocation hence the ask.
- Run unitrace with '-c' option to check which API call is crashing. It will help you understand if any particular kernel launch has failed due to application bug.
- By default unitrace writes into .json file only at the end of successful run. Since there is crash hence you are seeing empty file.
Hello @yitian1031, Thanks for reporting the issue. I have few questions/suggestions to handle the issue better.
- Are you able to run the test.py without unitrace? As per the call stack shared, it looks like application (test.py) error due to bad allocation hence the ask.
- Run unitrace with '-c' option to check which API call is crashing. It will help you understand if any particular kernel launch has failed due to application bug.
- By default unitrace writes into .json file only at the end of successful run. Since there is crash hence you are seeing empty file.
The test.py can successfully run without unitrace;
Following your suggestion, I added -c option,and it seems that zeCommandListAppendLaunchKernel aborted
And another error occurs when the set bash cmd as below:
LD_PRELOAD=/home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/libunitrace_tool.so /home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/unitrace --chrome-kernel-logging --chrome-device-logging python test.py
When i use ulimit -s to reset stack size with bigger value, the above issues still ocur
And it seems there is something wrong with the unitrace tool,how can i fix it?
@yitian1031, may I know if you are running it under conda environment? It yes, can you build the tool fresh and try to run? We have seen some time different conda environments are having different libraries linked hence building in one and running in other may cause issues.
@yitian1031, may I know if you are running it under conda environment? It yes, can you build the tool fresh and try to run? We have seen some time different conda environments are having different libraries linked hence building in one and running in other may cause issues.
I run under a conda environment, and I rebuild the tool via the latest code, and the tool can not run this time:
@yitian1031 Please check the version of libstdc++.so in you conda env. If it is lower than 6.0.30, you need to upgrade it at least 6.0.30.
Also you don't need to preload the libunitrace_tool.so.