pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

Can not generate .json event trace

Open yitian1031 opened this issue 1 year ago • 6 comments

Running command shown as below: LD_PRELOAD=/home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/libunitrace_tool.so /home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/unitrace --chrome-sycl-logging --chrome-dnn-logging --chrome-call-logging --chrome-kernel-logging --chrome-device-logging python test.py And here comes the segment fault: image The generated json files contain nothing.

When running command as: LD_PRELOAD=/home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/libunitrace_tool.so /home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/unitrace -d -s -t --chrome-kernel-logging --chrome-device-logging --chrome-no-thread-on-device --chrome-no-engine-on-device python test.py

Here comes the aborted error: image The generated json files contain some logging records.

yitian1031 avatar Mar 27 '24 08:03 yitian1031

Hello @yitian1031, Thanks for reporting the issue. I have few questions/suggestions to handle the issue better.

  1. Are you able to run the test.py without unitrace? As per the call stack shared, it looks like application (test.py) error due to bad allocation hence the ask.
  2. Run unitrace with '-c' option to check which API call is crashing. It will help you understand if any particular kernel launch has failed due to application bug.
  3. By default unitrace writes into .json file only at the end of successful run. Since there is crash hence you are seeing empty file.

Sarbojit2019 avatar Apr 01 '24 03:04 Sarbojit2019

Hello @yitian1031, Thanks for reporting the issue. I have few questions/suggestions to handle the issue better.

  1. Are you able to run the test.py without unitrace? As per the call stack shared, it looks like application (test.py) error due to bad allocation hence the ask.
  2. Run unitrace with '-c' option to check which API call is crashing. It will help you understand if any particular kernel launch has failed due to application bug.
  3. By default unitrace writes into .json file only at the end of successful run. Since there is crash hence you are seeing empty file.

The test.py can successfully run without unitrace; image

Following your suggestion, I added -c option,and it seems that zeCommandListAppendLaunchKernel aborted image

And another error occurs when the set bash cmd as below: LD_PRELOAD=/home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/libunitrace_tool.so /home/yitian/wyt/unitrace1/pti-gpu/tools/unitrace/build/unitrace --chrome-kernel-logging --chrome-device-logging python test.py image

image

When i use ulimit -s to reset stack size with bigger value, the above issues still ocur

And it seems there is something wrong with the unitrace tool,how can i fix it?

yitian1031 avatar Apr 17 '24 07:04 yitian1031

@yitian1031, may I know if you are running it under conda environment? It yes, can you build the tool fresh and try to run? We have seen some time different conda environments are having different libraries linked hence building in one and running in other may cause issues.

Sarbojit2019 avatar Apr 18 '24 16:04 Sarbojit2019

@yitian1031, may I know if you are running it under conda environment? It yes, can you build the tool fresh and try to run? We have seen some time different conda environments are having different libraries linked hence building in one and running in other may cause issues.

I run under a conda environment, and I rebuild the tool via the latest code, and the tool can not run this time: image

image

yitian1031 avatar Apr 22 '24 02:04 yitian1031

@yitian1031 Please check the version of libstdc++.so in you conda env. If it is lower than 6.0.30, you need to upgrade it at least 6.0.30.

Also you don't need to preload the libunitrace_tool.so.

zma2 avatar Jun 24 '24 15:06 zma2