pytorch icon indicating copy to clipboard operation
pytorch copied to clipboard

Building PyTorch w/o Docker ?

Open gateway opened this issue 7 years ago • 22 comments

Hi, Im trying to get my AMD system set up to run some torch software , I prefer not to have to mess with Docker, is there a reason to do this ?

Is there a way to build this w/o docker?

gateway avatar Jan 02 '19 18:01 gateway

Sure, make sure that you install the dependencies as listed inside the docker files and follow the subsequent steps afterwards.

iotamudelta avatar Jan 02 '19 19:01 iotamudelta

Sure, make sure that you install the dependencies as listed inside the docker files and follow the subsequent steps afterwards.

An installation script may be very good and helpful. I would be grateful if you could provide one for the community!

iamkucuk avatar Jan 13 '19 16:01 iamkucuk

Any progress on that?

iamkucuk avatar Feb 20 '19 20:02 iamkucuk

@iotamudelta can you point me to the docker file you are referring to ? Is it that one ?

This is what I did to compile pytorch:

  1. Install pytorch dependencies, rocm-dev and a bunch of rocm libraries. CMake will gracefully tell you which are missing

  2. execute ./.jenkins/caffe2/build.sh it hipifies the caffe2 source code, generating the missing files required for the compilation. You might be able to just run python tools/amd_build/build_amd.py but I have not tried it alone.

  3. compile pytorch as usual python setup.py develop.

The compilation is still going so I am not sure if it is all I needed to do but it looks good so far. hipcc uses a lot of memory. I had a few OOM errors, made me restart with make -j 1.

Delaunay avatar Feb 22 '19 15:02 Delaunay

@Delaunay yes, that Dockerfile is part of it - I'd recommend using https://github.com/ROCmSoftwarePlatform/pytorch/blob/master/docker/caffe2/jenkins/build.sh with "py2-clang7-rocmdeb-ubuntu16.04" as the argument if you build your own docker. A standalone Dockerfile is here: https://raw.githubusercontent.com/wiki/ROCmSoftwarePlatform/pytorch/Dockerfile

Yes, just running python tools/amd_build/build_amd.py is sufficient to hipify the full source.

How much RAM do you have? A good rule of thumb seems to be MAX_JOBS=(RAM in GB)/4.

iotamudelta avatar Feb 22 '19 16:02 iotamudelta

I only have 8Go on that machine. I was able to compile pytorch with ninja (without it the installation fails) but the version I compiled is not functional.

Would you know it is an issue with the configuration of the compilation or if the kernel is really missing ? Thanks

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
'Ellesmere [Radeon RX 470/480/570/570X/580/580X]'
>>> torch.cuda.max_memory_allocated(0)
1024
>>> t = torch.zeros((10, 10, 10), dtype=torch.float32)
>>> t.cuda()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/setepenre/rocm_pytorch/torch/tensor.py", line 70, in __repr__
    return torch._tensor_str._str(self)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 285, in _str
    tensor_str = _tensor_str(self, indent)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 203, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/home/setepenre/rocm_pytorch/torch/_tensor_str.py", line 89, in __init__
    nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
  File "/home/setepenre/rocm_pytorch/torch/functional.py", line 222, in isfinite
    return (tensor == tensor) & (tensor.abs() != inf)
RuntimeError: No device code available for function: _Z21kernelPointwiseApply3I10TensorEQOpIfhEhffjLi1ELi1ELi1EEv10OffsetInfoIT0_T3_XT4_EES2_IT1_S4_XT5_EES2_IT2_S4_XT6_EES4_T_

Delaunay avatar Feb 23 '19 17:02 Delaunay

@Delaunay what GPU do you have? We currently need to compile specifically for a microarchitecture (changes to that are incoming). Export HCC_AMDGPU_TARGET prior to building to your uarch - either gfx803 (which we do not support well in PT, if you find issues please report them), gfx900 (Vega64/Vega56 generation, these work well), or gfx906 (Radeon VII, this should also work well)

iotamudelta avatar Feb 23 '19 21:02 iotamudelta

Thanks, recompiled it overnight for the gfx803. It is working now. I only have one test failing on my side. Is it supposed to ? If not I can open another ticket and gather info on it.

======================================================================
FAIL: test_multinomial_invalid_probs_cuda (test_cuda.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/setepenre/rocm_pytorch/test/common_utils.py", line 296, in wrapper
    method(*args, **kwargs)
  File "/home/setepenre/rocm_pytorch/test/test_cuda.py", line 2223, in test_multinomial_invalid_probs_cuda
    self._spawn_method(test_method, torch.Tensor([1, -1, 1]))
  File "/home/setepenre/rocm_pytorch/test/test_cuda.py", line 2203, in _spawn_method
    self.fail(e)
AssertionError: False

Delaunay avatar Feb 25 '19 12:02 Delaunay

Yeah, that test works for me on gfx906. So please do open a ticket. I don't have a gfx803 setup currently but I'll try to have a look at it when I do and have time. In the meantime, we can discuss in that ticket how to root cause.

Is that the only failing test? That'd be better than I thought, to be honest.

iotamudelta avatar Feb 25 '19 17:02 iotamudelta

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

  • test_autograd 919 tests in 161s (6 skipped)
  • test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.


For the person stumbling upon this thread. You can find below the rough steps describing how to compile without docker:

  1. Install ROCm here
  2. Install PyTorch dependencies (I recommend using Ninja)
  3. Install ROCm PyTorch dependencies (some might already be installed)
    • rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc, hip-thrust
  4. Clone PyTorch repository
  5. 'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py
  6. You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)
  7. python setup.py [develop|install]
  8. Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Delaunay avatar Feb 25 '19 23:02 Delaunay

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

  • test_autograd 919 tests in 161s (6 skipped)
  • test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread. You can find below the rough steps describing how to compile without docker:

  1. Install ROCm here

  2. Install PyTorch dependencies (I recommend using Ninja)

  3. Install ROCm PyTorch dependencies (some might already be installed)

    • rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc
  4. Clone PyTorch repository

  5. 'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

  6. Pick the architecture you want to compile for by setting HCC_AMDGPU_TARGET=gfx900 (multi arch support incoming)

    • gfx906 for Radeon VII
    • gfx900 for Vega
    • gfx803 for Radeon RX 470/480/570/570X/580/580X
  7. You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)

  8. python setup.py [develop|install]

  9. Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Finally a proper answer! I can't thank you enough for this! Will try it ASAP!

iamkucuk avatar Feb 26 '19 09:02 iamkucuk

This is what I got on my side overall with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

  • test_autograd 919 tests in 161s (6 skipped)
  • test_cuda 154 tests in 19sec (77 skipped, 1 failed)

I also ran resnet18 & resnet50. I will do more testing later but for now the timings look great.

For the person stumbling upon this thread. You can find below the rough steps describing how to compile without docker:

  1. Install ROCm here

  2. Install PyTorch dependencies (I recommend using Ninja)

  3. Install ROCm PyTorch dependencies (some might already be installed)

    • rocrand, hiprand, rocblas, miopen, miopengemm, rocfft, rocsparse, rocm-cmake, rocm-dev, rocm-device-libs,rocm-libs, hcc, hip_base,hip_hcc
  4. Clone PyTorch repository

  5. 'Hipify' PyTorch source by executing python tools/amd_build/build_amd.py

  6. Pick the architecture you want to compile for by setting HCC_AMDGPU_TARGET=gfx900 (multi arch support incoming)

    • gfx906 for Radeon VII
    • gfx900 for Vega
    • gfx803 for Radeon RX 470/480/570/570X/580/580X
  7. You can set export USE_NINJA=1 and export MAX_JOBS=N (N=(RAM in GB)/4)

  8. python setup.py [develop|install]

  9. Make sure everything is working with PYTORCH_TEST_WITH_ROCM=1 python test/run_test.py

Quick questions: I don't have any info about Ninja. Is this the package manager you are talking about? Is there a documentation for how to use it and using pip instead of ninja causes any trouble? Where can I find a RocM alternative for magma-cuda dependency? Or should I just ignore it?

iamkucuk avatar Feb 27 '19 19:02 iamkucuk

ninja is just the build system that pytorch can use to compile itself. You do not have to use it. it is explained here.

ROCm has rocblas and miopen for linear algebra and Machine learning primitives respectively. I did not see anything about Magma when I installed pytorch.

Delaunay avatar Feb 27 '19 23:02 Delaunay

@Delaunay thanks for the info I managed to build pytorch from source on my box! I should mention that I had to install thrust hip port to build caffe2.

masahi avatar Feb 28 '19 11:02 masahi

thanks, I updated the list of dependencies

Delaunay avatar Feb 28 '19 12:02 Delaunay

https://github.com/ROCmSoftwarePlatform/pytorch/issues/337#issuecomment-467220107 doesn't seem to work for me. I get this error no matter what I try:

 By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "hip", but
  CMake did not find one.

I'm willing to help debug the issue, I have all dependencies already installed.

hameerabbasi avatar Mar 12 '19 16:03 hameerabbasi

@Delaunay could you remove step #6 pertaining to HCC_AMDGPU_TARGET? The default is multi-arch now and it's a debug flag of the compiler that I'd rather we not continue exploit. :-)

iotamudelta avatar Mar 15 '19 19:03 iotamudelta

nice, I updated it

Delaunay avatar Mar 15 '19 22:03 Delaunay

@Delaunay Hi mate! I'm trying to build pytorch with your way, however I'm experiencing some issues. Here is my script. Can you check it out? https://gist.github.com/iamkucuk/c8f74ec6d4f91804d6ff3d1006f26040

iamkucuk avatar Apr 19 '19 21:04 iamkucuk

We added documentation for host installs here: https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Please note that this requires good knowledge of your operating system, its package manager, and unfortunately in step 4) makes alterations to the ROCm install itself - we are hoping to fix the last in the future.

iotamudelta avatar Apr 23 '19 20:04 iotamudelta

We added documentation for host installs here: https://github.com/ROCmSoftwarePlatform/pytorch/wiki/Building-PyTorch-for-ROCm#option-4-install-directly-on-host

Please note that this requires good knowledge of your operating system, its package manager, and unfortunately in step 4) makes alterations to the ROCm install itself - we are hoping to fix the last in the future.

Why don't you provide a script for full installation process? PyTorch is becoming more popular, especially in academic world.

iamkucuk avatar Apr 25 '19 06:04 iamkucuk

Why don't you provide a script for full installation process? PyTorch is becoming more popular, especially in academic world.

@msabony1966

dagamayank avatar Sep 20 '19 19:09 dagamayank

Closing this issue due to its age and the existence of our ROCm component build platform, TheRock.

Using TheRock to build Pytorch on baremetal (without Docker) is available now, and will become the standard flow for building ROCm Pytorch from source going forward after the release of ROCm 7.0.

Please follow the guide here to get started.

If you encounter problems using TheRock, feel free to open new issues on the issues page. Thanks!

lucbruni-amd avatar Jun 26 '25 17:06 lucbruni-amd