omniperf
omniperf copied to clipboard
No such file or directory error
Using latest omniperf to run some xla tests.
GPU -mi300
omniperf profile -n scatt -- /grok/grok-1-rocm/xla/bazel-bin/xla/service/gpu/tests/select_and_scatter_test --gtest_filter=SelectAndScatterTest.SelectAndScatterPerformance
___ _ __
/ _ \ _ __ ___ _ __ (_)_ __ ___ _ __ / _|
| | | | '_ ` _ \| '_ \| | '_ \ / _ \ '__| |_
| |_| | | | | | | | | | | |_) | __/ | | _|
\___/|_| |_| |_|_| |_|_| .__/ \___|_| |_|
|_|
Omniperf version: 2.0.0-RC1
Profiler choice: rocprofv2
Path: /grok/grok-1-rocm/xla/workloads/scatt/MI300X_A1
Target: MI300X_A1
Command: /grok/grok-1-rocm/xla/bazel-bin/xla/service/gpu/tests/select_and_scatter_test --gtest_filter=SelectAndScatterTest.SelectAndScatterPerformance
Kernel Selection: None
Dispatch Selection: None
IP Blocks: All
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Collecting Performance Counters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[profiling] Current input file: /grok/grok-1-rocm/xla/workloads/scatt/MI300X_A1/perfmon/SQ_IFETCH_LEVEL.txt
|-> [/opt/rocm-6.2.0-13796/bin/rocprofv2] /bin/bash: warning: setlocale: LC_ALL: cannot change locale (en_US)
|-> [/opt/rocm-6.2.0-13796/bin/rocprofv2] /opt/rocm-6.2.0-13796/bin/rocprofv2: line 301: /grok/grok-1-rocm/xla/bazel-bin/xla/service/gpu/tests/select_and_scatter_test --gtest_filter=SelectAndScatterTest.SelectAndScatterPerformance: No such file or directory
|-> [/opt/rocm-6.2.0-13796/bin/rocprofv2]
ERROR Profiling execution failed.
However, the command works without omniperf!
@Ruturaj4 It looks like there was an issue setting local on this system to UTF-8. Particularly this function call: https://github.com/ROCm/omniperf/blob/0c8591ccca179e2f22cd4e402197434619be40f5/src/utils/utils.py#L607-L615
Could you try $ locale.setlocale(locale.LC_ALL, "en_US.UTF-8") manually to see if this was where our error came from
@Ruturaj4 It looks like there was an issue setting local on this system to UTF-8. Particularly this function call:
https://github.com/ROCm/omniperf/blob/0c8591ccca179e2f22cd4e402197434619be40f5/src/utils/utils.py#L607-L615
Could you try
$ locale.setlocale(locale.LC_ALL, "en_US.UTF-8")manually to see if this was where our error came from
yeah, I tried that already. But looks like that is the same issue with rocprofv2 (I get the same warning from rocprofv2), however rocprofv2 works just fine.
Hi @coleramos425 I have similar problem on both MI200 and MI300. I can't call omniperf profile in this way: omniperf profile -n vcopy -- ./vcopy -n 1048576 -b 256 it give me error:
INFO Omniperf version: 2.0.0
INFO Profiler choice: rocprofv2
INFO Path: /root/workspace_raid/omniperf/sample/workloads/vcopy/MI200
INFO Target: MI200
INFO Command: ./vcopy -n 1048576 -b 256
INFO Kernel Selection: None
INFO Dispatch Selection: None
INFO Hardware Blocks: All
INFO
INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO Collecting Performance Counters
INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO
INFO [profiling] Current input file: /root/workspace_raid/omniperf/sample/workloads/vcopy/MI200/perfmon/SQ_IFETCH_LEVEL.txt
INFO |-> [rocprofv2] /usr/bin/rocprofv2: line 301: ./vcopy -n 1048576 -b 256: No such file or directory
INFO |-> [rocprofv2]
ERROR Profiling execution failed.
However, If I put the ./vcopy -n 1048576 -b 256 into a shell script, I call profile with: omniperf profile -n vcopy -- ./test.sh
My locale setting looks like that:
LANG=en_US.UTF-8
LANGUAGE=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8
BTW, I can use omniperf without problem yesterday but failed after docker container restart, so I think it's most likely a environment problem, but it's hard to figure out what cause that.
Hi @coleramos425 I have similar problem on both MI200 and MI300. I can't call omniperf profile in this way: omniperf profile -n vcopy -- ./vcopy -n 1048576 -b 256 it give me error:
INFO Omniperf version: 2.0.0 INFO Profiler choice: rocprofv2 INFO Path: /root/workspace_raid/omniperf/sample/workloads/vcopy/MI200 INFO Target: MI200 INFO Command: ./vcopy -n 1048576 -b 256 INFO Kernel Selection: None INFO Dispatch Selection: None INFO Hardware Blocks: All INFO INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INFO Collecting Performance Counters INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ INFO INFO [profiling] Current input file: /root/workspace_raid/omniperf/sample/workloads/vcopy/MI200/perfmon/SQ_IFETCH_LEVEL.txt INFO |-> [rocprofv2] /usr/bin/rocprofv2: line 301: ./vcopy -n 1048576 -b 256: No such file or directory INFO |-> [rocprofv2] ERROR Profiling execution failed.However, If I put the ./vcopy -n 1048576 -b 256 into a shell script, I call profile with: omniperf profile -n vcopy -- ./test.sh
My locale setting looks like that:
LANG=en_US.UTF-8 LANGUAGE=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=en_US.UTF-8BTW, I can use omniperf without problem yesterday but failed after docker container restart, so I think it's most likely a environment problem, but it's hard to figure out what cause that.
I can solve this issue by
export ROCPROF=rocprof
But rocprofv2 is recommended right?
@aska-0096 if we assume the vcopy executable is being compiled properly and you can confirm that with a quick sanity check (i.e. ./vcopy -n 1048576 -b 256), I would guess that the docker container is being reloaded incorrectly?
BTW, I can use omniperf without problem yesterday but failed after docker container restart, so I think it's most likely a environment problem, but it's hard to figure out what cause that.
One common issue is that the container isn't being loaded with the proper permissions/groups. For reference I usually use:
$ docker run -it --network=host --device=/dev/kfd --device=/dev/dri/renderD128 --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined <image-id>
Give this a try and if you still face issues we can schedule a meeting to debug. Thanks.
I came across this issue too, it works after putting the command to be executed into a shell script. Could you help to fix this bug
This issue is related to a rocprofv2 change, specifically, they now use exec to handle arg parsing in the latest versions of ROCm. A slight logic change was required on Omniperf's end to account for this. We've pushed a patch to our dev branch and the fix will be available in our next release