nixpkgs icon indicating copy to clipboard operation
nixpkgs copied to clipboard

Foldingathome ROCM GPU Support

Open codebam opened this issue 1 year ago • 8 comments

Describe the bug

https://foldingforum.org/viewtopic.php?p=358294 Essentially this bug. Work Units download endlessly and cannot be completed due to crashing.

Steps To Reproduce

Steps to reproduce the behavior:

  1. gpu with rocm support
  2. enable foldingathome and rocmPackages.clr.icd
  3. run foldingathome

Expected behavior

Work units would be executed and completed on gpu.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Log: https://r2.seanbehan.ca/a083b097-4abb-482e-909f-204df82a6d81

Notify maintainers

@sergv

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

[user@system:~]$ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.8.6, NixOS, 24.05 (Uakari), 24.05.20240417.edd8117`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.2`
 - channels(root): `"nixos"`
 - nixpkgs: `/nix/store/18p0lvi8gzlcj0nwnm6rhaqza5kg3g1g-source`

Add a :+1: reaction to issues you find important.

codebam avatar Apr 17 '24 18:04 codebam

@codebam Could you please try playing with extraPkgs argument of the folding at home nix package (defined at https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/misc/foldingathome/client.nix#L13C3-L13C12) to see whether explicitly adding libstdc++ in there solves the problem? If yes then it could be added to FHS packages. Sadly I don't have AMD-capable OpenCL so cannot test the suggestion myself.

Notifying real maintainer in the meantime @zimbatm.

sergv avatar Apr 19 '24 01:04 sergv

Not sure how to do that sorry. What package would provide libstdc++? I asked on Matrix and was told it should already be part of stdenv.

codebam avatar Apr 24 '24 21:04 codebam

I don't really know which package provides C++ standard library. One possibility seems to be the LLVM: llvmPackages_17.libcxx with the default being libcxx.

If it's already in stdenv then it seems like it could be accessed via pkgs.stdenv.cc.cc.lib according to https://discourse.nixos.org/t/how-to-solve-libstdc-not-found-in-shell-nix/25458/15.

Overall I cannot really suggest a working way because I cannot test my suggestions so you'll need to find a way.

sergv avatar Apr 27 '24 20:04 sergv

Oh I just realized extraPkgs isn't available. Thank you anyways for trying to help

codebam avatar Apr 27 '24 23:04 codebam

I am encountering the same error as codebam.

Could you please try playing with extraPkgs argument of the folding at home nix package (defined at https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/misc/foldingathome/client.nix#L13C3-L13C12) to see whether explicitly adding libstdc++ in there solves the problem? If yes then it could be added to FHS packages. Sadly I don't have AMD-capable OpenCL so cannot test the suggestion myself.

I have been unable to get this to work. I tried just adding to the package's extraPkgs. I also tried explicitly adding the library directly to each of the following inputs in pkgs/applications/science/misc/foldingathome/client.nix:

  1. fah-client's nativeBuildInputs
  2. fah-client's runtimeInputs
  3. fah-client's buildInputs
  4. main package's targetPkgs

Regardless of whether I used plain libcxx or rocmPackages.llvm.libcxx, I still got the original error:

opencl-device was set but OpenCL platform could not be found. ERROR:126: Neither CUDA nor OpenCL is available.

In case it helps, here are the relevant parts of my NixOS config:

{
  boot.initrd.kernelModules = ["amdgpu"];
  boot.kernelModules = ["kvm-amd"];

  environment.systemPackages = with pkgs; [
    radeontop
  ];

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
    extraPackages = with pkgs; [
      amdvlk
      clinfo
      rocmPackages.clr.icd
      rocmPackages.rocminfo
      rocmPackages.rocm-runtime
    ];
    extraPackages32 = with pkgs.driversi686Linux; [
      amdvlk
    ];
    setLdLibraryPath = true;
  };

  services.foldingathome.enable = true;

  # Heterogeneous-computing Interface for Portability (HIP)
  # https://rocm.docs.amd.com/projects/HIP/en/latest/index.html
  systemd.tmpfiles.rules = let
    rocmEnv = pkgs.symlinkJoin {
      name = "rocm-combined";
      paths = with pkgs.rocmPackages; [
        clr
        hipblas
        rocblas
      ];
    };
  in [
    "L+ /opt/rcom - - - - ${rocmEnv}"
  ];
}

lafrenierejm avatar Aug 26 '24 01:08 lafrenierejm

Can you share one of the binaries that doesn't execute? Most likely the binary ELF headers are looking for those libraries in traditional paths.

If that's the case then we have a problem. Binaries are getting downloaded by the folding client and executed directly. There isn't a good way to patchelf those (unless somebody wants to work on a source patch).

One workaround for that is to also set programs.nix-ld.enable, and then put the missing libraries in programs.nix-ld.libraries

zimbatm avatar Aug 27 '24 08:08 zimbatm

One workaround for that is to also set programs.nix-ld.enable, and then put the missing libraries in programs.nix-ld.libraries

Thanks for the tip! Unfortunately, I haven't been able to get past the error even with these changes in place. I tried a couple combinations of packages based on my incredibly limited understanding of ROCm, then tried just shoving all of the potential packages into nix-ld.libraries. Here's my relevant config with the "just include every potential package" approach:

{
  pkgs,
  ...
}: {
  boot.initrd.kernelModules = ["amdgpu"];
  boot.kernelModules = ["kvm-amd"];

  environment.systemPackages = with pkgs; [
    radeontop
  ];

  hardware.opengl = {
    enable = true;
    driSupport = true;
    driSupport32Bit = true;
    extraPackages = with pkgs; [
      amdvlk
      clinfo
      libcxx
      rocmPackages.clr.icd
      rocmPackages.hipblas
      rocmPackages.rocblas
      rocmPackages.rocm-runtime
      rocmPackages.rocminfo
      stdenv.cc.cc
    ];
    extraPackages32 = with pkgs.driversi686Linux; [
      amdvlk
    ];
    setLdLibraryPath = true;
  };

  programs.nix-ld.enable = true;
  programs.nix-ld.libraries = with pkgs; [
    amdvlk
    clinfo
    libcxx
    rocmPackages.clr
    rocmPackages.clr.icd
    rocmPackages.hipblas
    rocmPackages.rocblas
    rocmPackages.rocm-runtime
    rocmPackages.rocminfo
    stdenv.cc.cc
  ];

  services.foldingathome.enable = true;

  # Heterogeneous-computing Interface for Portability (HIP)
  # https://rocm.docs.amd.com/projects/HIP/en/latest/index.html
  systemd.tmpfiles.rules = let
    rocmEnv = pkgs.symlinkJoin {
      name = "rocm-combined";
      paths = with pkgs.rocmPackages; [
        clr
        hipblas
        rocblas
        rocm-runtime
      ];
    };
  in [
    "L+ /opt/rcom - - - - ${rocmEnv}"
  ];
}

lafrenierejm avatar Aug 30 '24 13:08 lafrenierejm

Can you share one of the binaries that doesn't execute? Most likely the binary ELF headers are looking for those libraries in traditional paths.

I'll try to grab one sometime this weekend.

lafrenierejm avatar Aug 30 '24 13:08 lafrenierejm

@lafrenierejm any luck with this? I see you currently have a good chunk of what you posted above in your public config.

Joseph-DiGiovanni avatar Mar 14 '25 03:03 Joseph-DiGiovanni

any luck with this? I see you currently have a good chunk of what you posted above in your public config.

@Joseph-DiGiovanni Unfortunately, no. I also haven't spent any time investigating this recently, so it's possible my configuration and/or the info I provided in this thread is out of date.

lafrenierejm avatar Mar 16 '25 04:03 lafrenierejm

Is there any update on this? I've also been trying...

Thiago-Assis-T avatar Apr 15 '25 22:04 Thiago-Assis-T

I can make this work with the following (taken from here), but only when I run fah-client directly. The systemd service seems to be broken but it's also broken on a different device with an Nvidia GPU, so I think that problem is unrelated.

hardware.amdgpu.opencl.enable = true;
systemd.tmpfiles.rules = ["L+ /opt/rocm/hip - - - - ${pkgs.rocmPackages.clr}"];
environment.variables.OCL_ICD_VENDORS = "${pkgs.rocmPackages.clr.icd}/etc/OpenCL/vendors/";

Could/should this be included in hardware.amdgpu.opencl.enable?

DoctorDalek1963 avatar Jul 24 '25 23:07 DoctorDalek1963