[Issue]: omnitrace-python gives a TypeError Exception on PyTorch script.
Problem Description
@feizheng10 told me that this is the best place to reach out to @jrmadsen
This is an issue that I encountered with Omnitrace 1.11.2. I follow all the steps here: https://rocm.github.io/omnitrace/python.html#running-omnitrace-on-a-python-script
and I can get the simple Fibonacci example to run correctly.
And if I run a simple PyTorch script,
import torch
n_warmup = 1
m = 64
n = 64
k = 64
dtype = torch.float32
A = torch.rand(m, k, device="cuda", dtype=dtype)
B = torch.rand(k, n, device="cuda", dtype=dtype)
# Run warmup iters
for i in range(n_warmup):
C = torch.matmul(A, B)
I end up with an exception:
root@ixt-hq-ubb4-33:/home/niromero/docker_workspace/example# OMNITRACE_PROFILE=ON OMNITRACE_TIMEMORY_COMPONENTS=trip_count omnitrace-python torch_simple.py
##### omnitrace :: executing 'python3 -m omnitrace torch_simple.py'... #####
[omnitrace]> profiling: ['/home/niromero/docker_workspace/example/torch_simple.py']
[omnitrace][2623926][omnitrace_init_tooling] Instrumentation mode: Trace
______ .___ ___. .__ __. __ .___________..______ ___ ______ _______
/ __ \ | \/ | | \ | | | | | || _ \ / \ / || ____|
| | | | | \ / | | \| | | | `---| |----`| |_) | / ^ \ | ,----'| |__
| | | | | |\/| | | . ` | | | | | | / / /_\ \ | | | __|
| `--' | | | | | | |\ | | | | | | |\ \----./ _____ \ | `----.| |____
\______/ |__| |__| |__| \__| |__| |__| | _| `._____/__/ \__\ \______||_______|
omnitrace v1.11.2 (rev: 1df597e049b240fb263e7fcd7bddc78097d27f00, tag: v1.11.2, x86_64-linux-gnu, compiler: GNU v9.4.0, rocm: v6.0.x)
[omnitrace][2623926] /proc/sys/kernel/perf_event_paranoid has a value of 4. Disabling PAPI (requires a value <= 2)...
[omnitrace][2623926] In order to enable PAPI support, run 'echo N | sudo tee /proc/sys/kernel/perf_event_paranoid' where N is <= 2
[163.727] perfetto.cc:58649 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""
Traceback (most recent call last):
File "/opt/omnitrace/lib/python/site-packages/omnitrace/__main__.py", line 385, in main
prof.runctx("execfile_(%r, globals())" % (script_file,), ns, ns)
File "/opt/omnitrace/lib/python/site-packages/omnitrace/profiler.py", line 219, in runctx
exec_(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/opt/omnitrace/lib/python/site-packages/omnitrace/__main__.py", line 56, in execfile
exec_(compile(f.read(), filename, "exec"), globals, locals)
File "/home/niromero/docker_workspace/example/torch_simple.py", line 1, in <module>
import torch
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/__init__.py", line 27, in <module>
from ._utils import _import_dotted_name, classproperty
File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_utils.py", line 951, in <module>
class CallbackRegistry(Generic[P]):
File "/opt/conda/envs/py_3.9/lib/python3.9/typing.py", line 277, in inner
return func(*args, **kwds)
File "/opt/conda/envs/py_3.9/lib/python3.9/typing.py", line 997, in __class_getitem__
raise TypeError(
TypeError: Parameters to Generic[...] must all be type variables
Exception - Parameters to Generic[...] must all be type variables
[omnitrace][2623926][0][omnitrace_finalize] finalizing...
[omnitrace][2623926][0][omnitrace_finalize]
[omnitrace][2623926][0][omnitrace_finalize] omnitrace/process/2623926 : 0.342944 sec wall_clock, 6.572 MB peak_rss, 6.730 MB page_rss, 0.540000 sec cpu_clock, 157.5 % cpu_util [laps: 1]
[omnitrace][2623926][0][omnitrace_finalize] omnitrace/process/2623926/thread/0 : 0.339873 sec wall_clock, 0.239749 sec thread_cpu_clock, 70.5 % thread_cpu_util, 5.456 MB peak_rss [laps: 1]
[omnitrace][2623926][0][omnitrace_finalize]
[omnitrace][2623926][0][omnitrace_finalize] Finalizing perfetto...
[omnitrace][2623926][perfetto]> Outputting '/home/niromero/docker_workspace/example/omnitrace-torch_simple-output/2024-06-03_19.18/perfetto-trace-2623926.proto' (892.96 KB / 0.89 MB / 0.00 GB)... Done
[omnitrace][2623926][trip_count]> Outputting 'omnitrace-torch_simple-output/2024-06-03_19.18/trip_count-2623926.json'
[omnitrace][2623926][trip_count]> Outputting 'omnitrace-torch_simple-output/2024-06-03_19.18/trip_count-2623926.txt'
[omnitrace][2623926][metadata]> Outputting 'omnitrace-torch_simple-output/2024-06-03_19.18/metadata-2623926.json' and 'omnitrace-torch_simple-output/2024-06-03_19.18/functions-2623926.json'
[omnitrace][2623926][0][omnitrace_finalize] Finalized: 0.129738 sec wall_clock, 7.672 MB peak_rss, 7.856 MB page_rss, 0.140000 sec cpu_clock, 107.9 % cpu_util
[164.206] perfetto.cc:60128 Tracing session 1 ended, total sessions:0
I seem to get a trip count file, but it seems in complete. I will try to attach that here as well. trip_count-2623926.txt
I also have a much broader question. I am trying to profile a small PyTorch workload. We suspect that the bottleneck occurs in the C++ code that runs on the CPU. Is using omnitrace-python the correct approach?
Operating System
Ubuntu 20.04.6 LTS
CPU
AMD EPYC 7713 64-Core Processor
GPU
AMD Instinct MI250
ROCm Version
ROCm 6.0.0
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
Sorry for the delay. I believe, based on the backtrace, you need to make the code importable with “main” support (i.e. wrap the code in a function and only have it execute when called as an executable script):
import torch
def run():
n_warmup = 1
m = 64
n = 64
k = 64
dtype = torch.float32
A = torch.rand(m, k, device="cuda", dtype=dtype)
B = torch.rand(k, n, device="cuda", dtype=dtype)
# Run warmup iters
for i in range(n_warmup):
C = torch.matmul(A, B)
if __name__ == "__main__":
run()
On the plus side, the approach usually makes the traces much cleaner because you can easily put the @profile decorator above def run(): and use the -b option to omnitrace-python and avoid tracing all the function calls that arise from importing a module (i.e. avoid tracing all function calls arising from import torch)
Example:
import torch
@profile
def run():
n_warmup = 1
m = 64
n = 64
k = 64
dtype = torch.float32
A = torch.rand(m, k, device="cuda", dtype=dtype)
B = torch.rand(k, n, device="cuda", dtype=dtype)
# Run warmup iters
for i in range(n_warmup):
C = torch.matmul(A, B)
if __name__ == "__main__":
run()
@jrmadsen This command doesn't seem to work at the moment
wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py
It looks like omnitrace-install.py is missing in the latest release.
Instead, I tested with v1.12.0. Seem to be working now without modification to the original torch_simple.py. I am going to close as completed.