omnitrace [Issue]: omnitrace-python gives a TypeError Exception on PyTorch script.

Problem Description

@feizheng10 told me that this is the best place to reach out to @jrmadsen

This is an issue that I encountered with Omnitrace 1.11.2. I follow all the steps here: https://rocm.github.io/omnitrace/python.html#running-omnitrace-on-a-python-script

and I can get the simple Fibonacci example to run correctly.

And if I run a simple PyTorch script,

import torch

n_warmup = 1
m = 64
n = 64
k = 64
dtype = torch.float32
A = torch.rand(m, k, device="cuda", dtype=dtype)
B = torch.rand(k, n, device="cuda", dtype=dtype)

# Run warmup iters
for i in range(n_warmup):
    C = torch.matmul(A, B)

I end up with an exception:

root@ixt-hq-ubb4-33:/home/niromero/docker_workspace/example# OMNITRACE_PROFILE=ON OMNITRACE_TIMEMORY_COMPONENTS=trip_count omnitrace-python torch_simple.py 

##### omnitrace :: executing 'python3 -m omnitrace torch_simple.py'... #####

[omnitrace]> profiling: ['/home/niromero/docker_workspace/example/torch_simple.py']
[omnitrace][2623926][omnitrace_init_tooling] Instrumentation mode: Trace


      ______   .___  ___. .__   __.  __  .___________..______          ___       ______  _______
     /  __  \  |   \/   | |  \ |  | |  | |           ||   _  \        /   \     /      ||   ____|
    |  |  |  | |  \  /  | |   \|  | |  | `---|  |----`|  |_)  |      /  ^  \   |  ,----'|  |__
    |  |  |  | |  |\/|  | |  . `  | |  |     |  |     |      /      /  /_\  \  |  |     |   __|
    |  `--'  | |  |  |  | |  |\   | |  |     |  |     |  |\  \----./  _____  \ |  `----.|  |____
     \______/  |__|  |__| |__| \__| |__|     |__|     | _| `._____/__/     \__\ \______||_______|

    omnitrace v1.11.2 (rev: 1df597e049b240fb263e7fcd7bddc78097d27f00, tag: v1.11.2, x86_64-linux-gnu, compiler: GNU v9.4.0, rocm: v6.0.x)
[omnitrace][2623926] /proc/sys/kernel/perf_event_paranoid has a value of 4. Disabling PAPI (requires a value <= 2)...
[omnitrace][2623926] In order to enable PAPI support, run 'echo N | sudo tee /proc/sys/kernel/perf_event_paranoid' where N is <= 2
[163.727]       perfetto.cc:58649 Configured tracing session 1, #sources:1, duration:0 ms, #buffers:1, total buffer size:1024000 KB, total sessions:1, uid:0 session name: ""
Traceback (most recent call last):
  File "/opt/omnitrace/lib/python/site-packages/omnitrace/__main__.py", line 385, in main
    prof.runctx("execfile_(%r, globals())" % (script_file,), ns, ns)
  File "/opt/omnitrace/lib/python/site-packages/omnitrace/profiler.py", line 219, in runctx
    exec_(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/opt/omnitrace/lib/python/site-packages/omnitrace/__main__.py", line 56, in execfile
    exec_(compile(f.read(), filename, "exec"), globals, locals)
  File "/home/niromero/docker_workspace/example/torch_simple.py", line 1, in <module>
    import torch
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/__init__.py", line 27, in <module>
    from ._utils import _import_dotted_name, classproperty
  File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_utils.py", line 951, in <module>
    class CallbackRegistry(Generic[P]):
  File "/opt/conda/envs/py_3.9/lib/python3.9/typing.py", line 277, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.9/lib/python3.9/typing.py", line 997, in __class_getitem__
    raise TypeError(
TypeError: Parameters to Generic[...] must all be type variables
Exception - Parameters to Generic[...] must all be type variables

[omnitrace][2623926][0][omnitrace_finalize] finalizing...
[omnitrace][2623926][0][omnitrace_finalize] 
[omnitrace][2623926][0][omnitrace_finalize] omnitrace/process/2623926 : 0.342944 sec wall_clock,    6.572 MB peak_rss,    6.730 MB page_rss, 0.540000 sec cpu_clock,  157.5 % cpu_util [laps: 1]
[omnitrace][2623926][0][omnitrace_finalize] omnitrace/process/2623926/thread/0 : 0.339873 sec wall_clock, 0.239749 sec thread_cpu_clock,   70.5 % thread_cpu_util,    5.456 MB peak_rss [laps: 1]
[omnitrace][2623926][0][omnitrace_finalize] 
[omnitrace][2623926][0][omnitrace_finalize] Finalizing perfetto...
[omnitrace][2623926][perfetto]> Outputting '/home/niromero/docker_workspace/example/omnitrace-torch_simple-output/2024-06-03_19.18/perfetto-trace-2623926.proto' (892.96 KB / 0.89 MB / 0.00 GB)... Done
[omnitrace][2623926][trip_count]> Outputting 'omnitrace-torch_simple-output/2024-06-03_19.18/trip_count-2623926.json'
[omnitrace][2623926][trip_count]> Outputting 'omnitrace-torch_simple-output/2024-06-03_19.18/trip_count-2623926.txt'
[omnitrace][2623926][metadata]> Outputting 'omnitrace-torch_simple-output/2024-06-03_19.18/metadata-2623926.json' and 'omnitrace-torch_simple-output/2024-06-03_19.18/functions-2623926.json'
[omnitrace][2623926][0][omnitrace_finalize] Finalized: 0.129738 sec wall_clock,    7.672 MB peak_rss,    7.856 MB page_rss, 0.140000 sec cpu_clock,  107.9 % cpu_util
[164.206]       perfetto.cc:60128 Tracing session 1 ended, total sessions:0

I seem to get a trip count file, but it seems in complete. I will try to attach that here as well. trip_count-2623926.txt

I also have a much broader question. I am trying to profile a small PyTorch workload. We suspect that the bottleneck occurs in the C++ code that runs on the CPU. Is using omnitrace-python the correct approach?

Operating System

Ubuntu 20.04.6 LTS

CPU

AMD EPYC 7713 64-Core Processor

GPU

AMD Instinct MI250

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Jun 03 '24 19:06 naromero77amd

Sorry for the delay. I believe, based on the backtrace, you need to make the code importable with “main” support (i.e. wrap the code in a function and only have it execute when called as an executable script):

import torch

def run():
  n_warmup = 1
  m = 64
  n = 64
  k = 64
  dtype = torch.float32
  A = torch.rand(m, k, device="cuda", dtype=dtype)
  B = torch.rand(k, n, device="cuda", dtype=dtype)

  # Run warmup iters
  for i in range(n_warmup):
      C = torch.matmul(A, B)

if __name__ == "__main__":
  run()

Jun 13 '24 21:06 jrmadsen

On the plus side, the approach usually makes the traces much cleaner because you can easily put the @profile decorator above def run(): and use the -b option to omnitrace-python and avoid tracing all the function calls that arise from importing a module (i.e. avoid tracing all function calls arising from import torch)

Jun 13 '24 21:06 jrmadsen

Example:

import torch

@profile
def run():
  n_warmup = 1
  m = 64
  n = 64
  k = 64
  dtype = torch.float32
  A = torch.rand(m, k, device="cuda", dtype=dtype)
  B = torch.rand(k, n, device="cuda", dtype=dtype)

  # Run warmup iters
  for i in range(n_warmup):
      C = torch.matmul(A, B)

if __name__ == "__main__":
  run()

Jun 13 '24 21:06 jrmadsen

@jrmadsen This command doesn't seem to work at the moment wget https://github.com/ROCm/omnitrace/releases/latest/download/omnitrace-install.py

It looks like omnitrace-install.py is missing in the latest release.

Instead, I tested with v1.12.0. Seem to be working now without modification to the original torch_simple.py. I am going to close as completed.

Oct 10 '24 00:10 naromero77amd