pytorch Building PyTorch w/o Docker ?

Hi, Im trying to get my AMD system set up to run some torch software , I prefer not to have to mess with Docker, is there a reason to do this ?

Is there a way to build this w/o docker?

Jan 02 '19 18:01 gateway

Sure, make sure that you install the dependencies as listed inside the docker files and follow the subsequent steps afterwards.

Jan 02 '19 19:01 iotamudelta

Sure, make sure that you install the dependencies as listed inside the docker files and follow the subsequent steps afterwards.

An installation script may be very good and helpful. I would be grateful if you could provide one for the community!

Jan 13 '19 16:01 iamkucuk

Any progress on that?

Feb 20 '19 20:02 iamkucuk

@iotamudelta can you point me to the docker file you are referring to ? Is it that one ?

This is what I did to compile pytorch:

Install pytorch dependencies, rocm-dev and a bunch of rocm libraries. CMake will gracefully tell you which are missing
execute ./.jenkins/caffe2/build.sh it hipifies the caffe2 source code, generating the missing files required for the compilation. You might be able to just run python tools/amd_build/build_amd.py but I have not tried it alone.
compile pytorch as usual python setup.py develop.

The compilation is still going so I am not sure if it is all I needed to do but it looks good so far. hipcc uses a lot of memory. I had a few OOM errors, made me restart with make -j 1.

Feb 22 '19 15:02 Delaunay

@Delaunay yes, that Dockerfile is part of it - I'd recommend using https://github.com/ROCmSoftwarePlatform/pytorch/blob/master/docker/caffe2/jenkins/build.sh with "py2-clang7-rocmdeb-ubuntu16.04" as the argument if you build your own docker. A standalone Dockerfile is here: https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/Dockerfile

Yes, just running python tools/amd_build/build_amd.py is sufficient to hipify the full source.

How much RAM do you have? A good rule of thumb seems to be MAX_JOBS=(RAM in GB)/4.

Feb 22 '19 16:02 iotamudelta

I only have 8Go on that machine. I was able to compile pytorch with ninja (without it the installation fails) but the version I compiled is not functional.

Would you know it is an issue with the configuration of the compilation or if the kernel is really missing ? Thanks

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'Ellesmere [Radeon RX 470/480/570/570X/580/580X]'
>>> torch.cuda.max_memory_allocated(0)
1024
>>> t = torch.zeros((10, 10, 10), dtype=torch.float32)
>>> t.cuda()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/setepenre/rocm_pytorch/torch/tensor.py", line 70, in __repr__
    return torch._tensor_str._str(self)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 285, in _str
    tensor_str = _tensor_str(self, indent)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 203, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 89, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
  File "/home/setepenre/rocm_pytorch/torch/functional.py", line 222, in isfinite
    return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: No device code available for function: _Z21kernelPointwiseApply3I10TensorEQOpIfhEhffjLi1ELi1ELi1EEv10OffsetInfoIT0_T3_XT4_EES2_IT1_S4_XT5_EES2_IT2_S4_XT6_EES4_T_

Feb 23 '19 17:02 Delaunay

@Delaunay what GPU do you have? We currently need to compile specifically for a microarchitecture (changes to that are incoming). Export HCC_AMDGPU_TARGET prior to building to your uarch - either gfx803 (which we do not support well in PT, if you find issues please report them), gfx900 (Vega64/Vega56 generation, these work well), or gfx906 (Radeon VII, this should also work well)

Feb 23 '19 21:02 iotamudelta

Thanks, recompiled it overnight for the gfx803. It is working now. I only have one test failing on my side. Is it supposed to ? If not I can open another ticket and gather info on it.

======================================================================
FAIL: test_multinomial_invalid_probs_cuda (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/setepenre/rocm_pytorch/test/common_utils.py", line 296, in wrapper
    method(*args, **kwargs)
  File "/home/setepenre/rocm_pytorch/test/test_cuda.py", line 2223, in test_multinomial_invalid_probs_cuda
    self._spawn_method(test_method, torch.Tensor([1, -1, 1]))
  File "/home/setepenre/rocm_pytorch/test/test_cuda.py", line 2203, in _spawn_method
    self.fail(e)
AssertionError: False

Feb 25 '19 12:02 Delaunay

Yeah, that test works for me on gfx906. So please do open a ticket. I don't have a gfx803 setup currently but I'll try to have a look at it when I do and have time. In the meantime, we can discuss in that ticket how to root cause.

Is that the only failing test? That'd be better than I thought, to be honest.

Feb 25 '19 17:02 iotamudelta

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

test_autograd 919 tests in 161s (6 skipped)
test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread. You can find below the rough steps describing how to compile without docker:

Install ROCm here
Install PyTorch dependencies (I recommend using Ninja)
Install ROCm PyTorch dependencies (some might already be installed)
- rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc, hip-thrust
Clone PyTorch repository
'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py
You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)
python setup.py [develop|install]
Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Feb 25 '19 23:02 Delaunay

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

test_autograd 919 tests in 161s (6 skipped)

test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread. You can find below the rough steps describing how to compile without docker:

Install ROCm here

Install PyTorch dependencies (I recommend using Ninja)

Install ROCm PyTorch dependencies (some might already be installed)

rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc

Clone PyTorch repository

'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

Pick the architecture you want to compile for by setting HCC_AMDGPU_TARGET=gfx900 (multi arch support incoming)

gfx906 for Radeon VII

gfx900 for Vega

gfx803 for Radeon RX 470/480/570/570X/580/580X

You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)

python setup.py [develop|install]

Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Finally a proper answer! I can't thank you enough for this! Will try it ASAP!

Feb 26 '19 09:02 iamkucuk

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

test_autograd 919 tests in 161s (6 skipped)

test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread. You can find below the rough steps describing how to compile without docker:

Install ROCm here

Install PyTorch dependencies (I recommend using Ninja)

Install ROCm PyTorch dependencies (some might already be installed)

rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc

Clone PyTorch repository

'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

Pick the architecture you want to compile for by setting HCC_AMDGPU_TARGET=gfx900 (multi arch support incoming)

gfx906 for Radeon VII

gfx900 for Vega

gfx803 for Radeon RX 470/480/570/570X/580/580X

You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)

python setup.py [develop|install]

Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Quick questions: I don't have any info about Ninja. Is this the package manager you are talking about? Is there a documentation for how to use it and using pip instead of ninja causes any trouble? Where can I find a RocM alternative for magma-cuda dependency? Or should I just ignore it?

Feb 27 '19 19:02 iamkucuk

ninja is just the build system that pytorch can use to compile itself. You do not have to use it. it is explained here.

ROCm has rocblas and miopen for linear algebra and Machine learning primitives respectively. I did not see anything about Magma when I installed pytorch.

Feb 27 '19 23:02 Delaunay

@Delaunay thanks for the info I managed to build pytorch from source on my box! I should mention that I had to install thrust hip port to build caffe2.

Feb 28 '19 11:02 masahi

thanks, I updated the list of dependencies

Feb 28 '19 12:02 Delaunay

https://github.com/ROCmSoftwarePlatform/pytorch/issues/337#issuecomment-467220107 doesn't seem to work for me. I get this error no matter what I try:

 By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hip", but
  CMake did not find one.

I'm willing to help debug the issue, I have all dependencies already installed.

Mar 12 '19 16:03 hameerabbasi

@Delaunay could you remove step #6 pertaining to HCC_AMDGPU_TARGET? The default is multi-arch now and it's a debug flag of the compiler that I'd rather we not continue exploit. :-)

Mar 15 '19 19:03 iotamudelta

nice, I updated it

Mar 15 '19 22:03 Delaunay

@Delaunay Hi mate! I'm trying to build pytorch with your way, however I'm experiencing some issues. Here is my script. Can you check it out? https://gist.github.com/iamkucuk/c8f74ec6d4f91804d6ff3d1006f26040

Apr 19 '19 21:04 iamkucuk

We added documentation for host installs here: https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Please note that this requires good knowledge of your operating system, its package manager, and unfortunately in step 4) makes alterations to the ROCm install itself - we are hoping to fix the last in the future.

Apr 23 '19 20:04 iotamudelta

We added documentation for host installs here: https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Please note that this requires good knowledge of your operating system, its package manager, and unfortunately in step 4) makes alterations to the ROCm install itself - we are hoping to fix the last in the future.

Why don't you provide a script for full installation process? PyTorch is becoming more popular, especially in academic world.

Apr 25 '19 06:04 iamkucuk

Why don't you provide a script for full installation process? PyTorch is becoming more popular, especially in academic world.

@msabony1966

Sep 20 '19 19:09 dagamayank

Closing this issue due to its age and the existence of our ROCm component build platform, TheRock.

Using TheRock to build Pytorch on baremetal (without Docker) is available now, and will become the standard flow for building ROCm Pytorch from source going forward after the release of ROCm 7.0.

Please follow the guide here to get started.

If you encounter problems using TheRock, feel free to open new issues on the issues page. Thanks!

Jun 26 '25 17:06 lucbruni-amd