scalene
scalene copied to clipboard
Scalene error: received signal SIGSEGV when using PyTorch on ROCm
Describe the bug When I run my training code written in PyTorch running on ROCm on an AMD GPU, I get an ominous error:
$ scalene training.py
Scalene error: received signal SIGSEGV
When I run the same code with only CPU profiling, the error disappears:
$ scalene --cpu-only training.py
To Reproduce
Since my training code is rather large, I cannot with reasonable effort provide a minimal working example. However, note that simple PyTorch code actually works fine:
from torch import Tensor
from torch import rand
def pytorch_iterating_random_tensor():
# Arrange
dimension_0: int = 3
dimension_1: int = 2
# Act
tensor: Tensor = rand(
dimension_0,
dimension_1,
)
# Assert
assert isinstance(tensor, Tensor)
for dimension_0_index in range(dimension_0):
for dimension_1_index in range(dimension_1):
assert 0 <= tensor[dimension_0_index][dimension_1_index] <= 1
if __name__ == '__main__':
pytorch_iterating_random_tensor()
Expected behavior
I would have expected Scalene to run on a more complex PyTorch application, just like on the trivial application.
Desktop (please complete the following information):
- OS: Ubuntu 22.04 + ROCm 5.2
- Version: 1.5.14, current repository version tested, too
Additional context
I first see some of my prints and then the SegFault, so it seems likely that the initialization of ROCm/OpenML causes the issue in Scalene.
According to the README, I believe only Nvidia GPUs are supported for profiling.