DeepSpeed [BUG] oneapi/ccl.hpp: No such file or directory.

Describe the bug

The builds on conda-forge have been failing since deepspeed=0.14.1 for CUDA 11.8 and 12.0 with an error like fatal error: oneapi/ccl.hpp: No such file or directory. Originally reported at https://github.com/conda-forge/deepspeed-feedstock/pull/56#issuecomment-2062611899.

To Reproduce Steps to reproduce the behavior:

Go to https://github.com/conda-forge/deepspeed-feedstock/pull/57 and clone the branch
Run python build_locally.py locally, select the option with CUDA 11.8 and Python 3.9
See error below

Expected behavior A clear and concise description of what you expected to happen.

CUDA builds work as expected.

ds_report output Please run ds_report to give us details about your setup.

Note, this isn't the exact report for the conda-forge CI device, I copied this from the CPU build logs

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [FAIL]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented  [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['$PREFIX/lib/python3.9/site-packages/torch']
torch version .................... 2.3.0.post101
deepspeed install path ........... ['$PREFIX/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.14.3, f492cfc, HEAD
deepspeed wheel compiled w. ...... torch 0.0 
shared memory (/dev/shm) size .... 64.00 MB

Screenshots

Truncated traceback from https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=953875&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=350df31b-3291-5209-0bb7-031395f0baa1&l=3486:

2024-06-12T22:49:04.3043574Z   building 'deepspeed.ops.comm.deepspeed_ccl_comm_op' extension
2024-06-12T22:49:04.3043994Z   creating build/temp.linux-x86_64-cpython-39
2024-06-12T22:49:04.3053293Z   creating build/temp.linux-x86_64-cpython-39/csrc
2024-06-12T22:49:04.3054113Z   creating build/temp.linux-x86_64-cpython-39/csrc/cpu
2024-06-12T22:49:04.3054729Z   creating build/temp.linux-x86_64-cpython-39/csrc/cpu/comm
2024-06-12T22:49:04.3071806Z   /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_build_env/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fPIC -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work=/usr/local/src/conda/deepspeed-0.14.3 -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p=/usr/local/src/conda-prefix -isystem /usr/local/cuda/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -isystem /usr/local/cuda/include -fPIC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/csrc/cpu/includes -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/TH -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/THC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/python3.9 -c csrc/cpu/comm/ccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/cpu/comm/ccl.o -O2 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=deepspeed_ccl_comm_op -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2024-06-12T22:49:08.2062484Z   csrc/cpu/comm/ccl.cpp:8:10: fatal error: oneapi/ccl.hpp: No such file or directory
2024-06-12T22:49:08.2067800Z       8 | #include <oneapi/ccl.hpp>
2024-06-12T22:49:08.2068222Z         |          ^~~~~~~~~~~~~~~~
2024-06-12T22:49:08.2068507Z   compilation terminated.
2024-06-12T22:49:08.2182741Z   error: command '/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_build_env/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
2024-06-12T22:49:08.6174937Z   error: subprocess-exited-with-error
2024-06-12T22:49:08.6176666Z   
2024-06-12T22:49:08.6188012Z   × python setup.py bdist_wheel did not run successfully.
2024-06-12T22:49:08.6227487Z   │ exit code: 1
2024-06-12T22:49:08.6240717Z   ╰─> See above for output.
2024-06-12T22:49:08.6252920Z   
2024-06-12T22:49:08.6264017Z   note: This error originates from a subprocess, and is likely not a problem with pip.
2024-06-12T22:49:08.6271330Z   full command: /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/bin/python -u -c '
2024-06-12T22:49:08.6272043Z   exec(compile('"'"''"'"''"'"'
2024-06-12T22:49:08.6277838Z   # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
2024-06-12T22:49:08.6283726Z   #
2024-06-12T22:49:08.6284428Z   # - It imports setuptools before invoking setup.py, to enable projects that directly
2024-06-12T22:49:08.6289287Z   #   import from `distutils.core` to work with newer packaging standards.
2024-06-12T22:49:08.6289949Z   # - It provides a clear error message when setuptools is not installed.
2024-06-12T22:49:08.6295383Z   # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
2024-06-12T22:49:08.6295837Z   #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
2024-06-12T22:49:08.6301077Z   #     manifest_maker: standard file '"'"'-c'"'"' not found".
2024-06-12T22:49:08.6307069Z   # - It generates a shim setup.py, for handling setup.cfg-only projects.
2024-06-12T22:49:08.6307810Z   import os, sys, tokenize
2024-06-12T22:49:08.6314125Z   
2024-06-12T22:49:08.6314907Z   try:
2024-06-12T22:49:08.6320956Z       import setuptools
2024-06-12T22:49:08.6321316Z   except ImportError as error:
2024-06-12T22:49:08.6325049Z       print(
2024-06-12T22:49:08.6326023Z           "ERROR: Can not execute `setup.py` since setuptools is not available in "
2024-06-12T22:49:08.6335095Z           "the build environment.",
2024-06-12T22:49:08.6335348Z           file=sys.stderr,
2024-06-12T22:49:08.6338543Z       )
2024-06-12T22:49:08.6338832Z       sys.exit(1)
2024-06-12T22:49:08.6339045Z   
2024-06-12T22:49:08.6339554Z   __file__ = %r
2024-06-12T22:49:08.6340070Z   sys.argv[0] = __file__
2024-06-12T22:49:08.6340336Z   
2024-06-12T22:49:08.6340562Z   if os.path.exists(__file__):
2024-06-12T22:49:08.6340835Z       filename = __file__
2024-06-12T22:49:08.6341059Z       with tokenize.open(__file__) as f:
2024-06-12T22:49:08.6341411Z           setup_py_code = f.read()
2024-06-12T22:49:08.6341621Z   else:
2024-06-12T22:49:08.6341993Z       filename = "<auto-generated setuptools caller>"
2024-06-12T22:49:08.6342280Z       setup_py_code = "from setuptools import setup; setup()"
2024-06-12T22:49:08.6342576Z   
2024-06-12T22:49:08.6342895Z   exec(compile(setup_py_code, filename, "exec"))
2024-06-12T22:49:08.6343569Z   '"'"''"'"''"'"' % ('"'"'/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-v4pibtb1
2024-06-12T22:49:08.6344021Z   cwd: /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/
2024-06-12T22:49:08.6344416Z   Building wheel for deepspeed (setup.py): finished with status 'error'
2024-06-12T22:49:08.6348857Z   ERROR: Failed building wheel for deepspeed
2024-06-12T22:49:08.6349126Z   Running setup.py clean for deepspeed
2024-06-12T22:49:08.6349635Z   Running command python setup.py clean
2024-06-12T22:49:10.9424015Z   No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2024-06-12T22:49:10.9498275Z   [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
2024-06-12T22:49:10.9509931Z   DS_BUILD_OPS=1
2024-06-12T22:49:17.9894660Z   Install Ops={'deepspeed_not_implemented': 1, 'deepspeed_ccl_comm': 1, 'deepspeed_shm_comm': 1, 'cpu_adam': 1, 'fused_adam': 1}
2024-06-12T22:49:18.0269777Z   version=0.14.3, git_hash=f492cfc, git_branch=HEAD
2024-06-12T22:49:18.0270870Z   install_requires=['hjson', 'ninja', 'numpy', 'nvidia-ml-py', 'packaging>=20.0', 'psutil', 'py-cpuinfo', 'pydantic', 'torch', 'tqdm']
2024-06-12T22:49:18.0278897Z   ext_modules=[<setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_not_implemented_op') at 0x7ff248dbe460>, <setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_ccl_comm_op') at 0x7ff248dbe4c0>, <setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_shm_comm_op') at 0x7ff248dbe520>, <setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7ff16dd51b80>, <setuptools.extension.Extension('deepspeed.ops.adam.fused_adam_op') at 0x7ff16dd51d90>]
2024-06-12T22:49:18.0651351Z   running clean
2024-06-12T22:49:18.0714575Z   removing 'build/temp.linux-x86_64-cpython-39' (and everything under it)
2024-06-12T22:49:18.0715596Z   removing 'build/lib.linux-x86_64-cpython-39' (and everything under it)
2024-06-12T22:49:18.1133919Z   'build/bdist.linux-x86_64' does not exist -- can't clean it
2024-06-12T22:49:18.1143315Z   'build/scripts-3.9' does not exist -- can't clean it
2024-06-12T22:49:18.1151899Z   removing 'build'
2024-06-12T22:49:18.1171105Z   deepspeed build time = 0.08735942840576172 secs
2024-06-12T22:49:18.5956605Z Failed to build deepspeed
2024-06-12T22:49:18.5973299Z ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
2024-06-12T22:49:18.5973660Z Exception information:
2024-06-12T22:49:18.5986009Z Traceback (most recent call last):
2024-06-12T22:49:18.5993394Z   File "$PREFIX/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
2024-06-12T22:49:18.5993851Z     status = run_func(*args)
2024-06-12T22:49:18.5994316Z   File "$PREFIX/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
2024-06-12T22:49:18.5995399Z     return func(self, options, args)
2024-06-12T22:49:18.5999462Z   File "$PREFIX/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 429, in run
2024-06-12T22:49:18.5999708Z     raise InstallationError(
2024-06-12T22:49:18.6006202Z pip._internal.exceptions.InstallationError: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
2024-06-12T22:49:18.6010801Z Removed build tracker: '/tmp/pip-build-tracker-gvbib0oo'
2024-06-12T22:49:20.4656669Z Traceback (most recent call last):
2024-06-12T22:49:20.4664375Z   File "/opt/conda/bin/conda-build", line 11, in <module>
2024-06-12T22:49:20.4669893Z     sys.exit(execute())
2024-06-12T22:49:20.4670437Z   File "/opt/conda/lib/python3.10/site-packages/conda_build/cli/main_build.py", line 590, in execute
2024-06-12T22:49:20.4677725Z     api.build(
2024-06-12T22:49:20.4678886Z   File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 250, in build
2024-06-12T22:49:20.4685860Z     return build_tree(
2024-06-12T22:49:20.4691479Z   File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3638, in build_tree
2024-06-12T22:49:20.4708481Z     packages_from_this = build(
2024-06-12T22:49:20.4713969Z   File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2506, in build
2024-06-12T22:49:20.4714313Z     utils.check_call_env(
2024-06-12T22:49:20.4724506Z   File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 405, in check_call_env
2024-06-12T22:49:20.4729616Z     return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
2024-06-12T22:49:20.4730205Z   File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 381, in _func_defaulting_env_to_os_environ
2024-06-12T22:49:20.4735612Z     raise subprocess.CalledProcessError(proc.returncode, _args)
2024-06-12T22:49:20.4736400Z subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/conda_build.sh']' returned non-zero exit status 1.
2024-06-12T22:49:30.4784588Z 
2024-06-12T22:49:30.5793301Z ##[error]Bash exited with code '1'.
2024-06-12T22:49:30.5974127Z ##[section]Finishing: Run docker build

System info (please complete the following information):

OS: Ubuntu 22.04.4
GPU count and types [e.g. two machines with x8 A100s each]: 1 NVIDIA GPU
Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]: N/A
Python version: 3.9
Any other relevant info about your setup: None

Launcher context Are you launching your experiment with the deepspeed launcher, MPI, or something else? No

Docker context Are you using a specific docker image that you can share?

quay.io/condaforge/linux-anvil-cuda:11.8

Additional context Add any other context about the problem here.

The builds have been failing in these PRs as well:

deepspeed 0.14.2 - https://github.com/conda-forge/deepspeed-feedstock/pull/57#issuecomment-2078635322
deepspeed 0.14.3 - https://github.com/conda-forge/deepspeed-feedstock/pull/62

Jun 12 '24 23:06 weiji14

Thanks @weiji14 for opening this to track.

Jun 13 '24 21:06 loadams

Hello, any update on this issue?

Aug 09 '24 00:08 tgkul

Follow the instructions here to install oneccl-devel from Intel as:

conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel

Solved this problem.

Aug 11 '24 07:08 SnzFor16Min

Thanks so much @SnzFor16Min for pointing me to that oneccl-devel package (which is also on the conda-forge channel at https://anaconda.org/conda-forge/oneccl-devel/files). As a temporary workaround, I've managed to build deepspeed=0.14.4 at https://github.com/conda-forge/deepspeed-feedstock/pull/63 by adding oneccl-devel to the host dependencies.

That said, I'm still unsure if this issue should be closed, because this Intel oneAPI Toolkit should only be used for CPU builds and not CUDA (GPU) builds no? As mentioned at https://github.com/conda-forge/deepspeed-feedstock/pull/56#issuecomment-2065192465:

Do we need to get that oneapi/ccl.hpp file from somewhere? Don't quite get it since these are CUDA (GPU) builds, not CPU builds.

@weiji14 - It looks like this comes from the Intel extensions for pytorch, but we shouldn't need that, and some DeepSpeed tests should have caught that. I'll take a look soon to see if I can tell why we are hitting this here.

Will leave this up to @loadams and the deepspeed team to resolve.

Aug 11 '24 23:08 weiji14

I'm no expert in building DeepSpeed, but as I see DS_BUILD_OPS=1 in the traceback, perhaps @weiji14 you should check if the building script was also pre-compiling CPU ops (e.g., DS_BUILD_CPU_ADAM). This is mentioned in the DeepSpeed documentation, which might require oneAPI libraries even if it's a GPU build.

Aug 12 '24 02:08 SnzFor16Min

Ah yes, the DS_BUILD_OPS=1 flag is set at https://github.com/conda-forge/deepspeed-feedstock/blame/b0193a708c3f1f6864e2a85f7cbdf92ee3bf39ff/recipe/build.sh#L4-L6, and DS_BUILD_CPU_ADAM might be enabled as a result (I haven't checked those build flags for almost a year). Maybe it doesn't hurt to compile with the CPU ops enabled even on CUDA?

Aug 12 '24 03:08 weiji14

@weiji14 - this should be fine to add to the dependencies, it should not cause any issues on the CUDA builds.

Also it should be fine to leave DS_BUILD_OPS=1, That should have all ops enabled, including the CPU ops.

I'd say lets leave this open for now, and I'll check back to confirm we have no issues reported from users, and we can also confirm the flow works with the next DeepSpeed release.

Aug 12 '24 17:08 loadams

C:\ComfyUI\ComfyUI\custom_nodes\EasyAnimate>conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel Channels:

https://software.repos.intel.com/python/conda
conda-forge
defaults Platform: win-64 Collecting package metadata (repodata.json): done Solving environment: failed

PackagesNotFoundError: The following packages are not available from current channels:

“Is this package no longer installable?”

Oct 15 '24 11:10 whois206

@delock, can you help with this?

Oct 15 '24 13:10 tjruwase

@whois206 - I had no problems running the command that you listed:

(base) test@deepspeed:~$ conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel
Channels:
 - https://software.repos.intel.com/python/conda
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/deepspeed/miniconda3

  added / updated specs:
    - oneccl-devel


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |            2_gnu          23 KB  conda-forge
    certifi-2024.8.30          |     pyhd8ed1ab_0         160 KB  conda-forge
    impi_rt-2021.14.0          |        intel_790        52.6 MB  https://software.repos.intel.com/python/conda
    intel-cmplr-lib-rt-2025.0.0|       intel_1169        38.3 MB  https://software.repos.intel.com/python/conda
    intel-cmplr-lib-ur-2025.0.0|       intel_1169         5.6 MB  https://software.repos.intel.com/python/conda
    intel-cmplr-lic-rt-2025.0.0|       intel_1169          20 KB  https://software.repos.intel.com/python/conda
    intel-sycl-rt-2025.0.0     |       intel_1169         5.1 MB  https://software.repos.intel.com/python/conda

I do notice one thing in your logs is that your platform is win64 - do you have a Windows node? And if so, could you let us know what accelerator you are trying to use?

Oct 31 '24 16:10 loadams

I read the initial description. From ds_report it looks like @weiji14 is reporting a non-CUDA system (the ops in ds_report) indicates CPU accelerator are selected. Is this intended?

If running on CPU is intended, this is the guide for install dependencies for Intel CPU. https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/accelerator-setup-guide.md#installation-steps-for-intel-architecture-cpu

If you only run DeepSpeed on single machine, the steps above should be enough. I'm wondering why the system wants to build deepspeed_ccl_comm with oneCCL, this is usually optional. And by default, DeepSpeed only attempts to build deepspeed_ccl_comm when oneccl-binding-pt is installed.

For @whois206 's question, I need to check whether there are oneccl-devel package for Windows. Will answer later.

Nov 02 '24 11:11 delock

@whois206 oneccl-devel package does not have Windows support. I checked oneCCL product page https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl-download.html, there seems no Windows system download package.

Nov 04 '24 08:11 delock

@loadams this is completely broken, here's a full failing build pipeline which both fails to install the oneccl-devel package and then tries to run build_win to fail because of csrc/cpu/comm/ccl.cpp(8): fatal error C1083: Cannot open include file: 'oneapi/ccl.hpp': No such file or directory: https://github.com/rsxdalv/DeepSpeed/actions/runs/14977741000/job/42074217352#step:10:222

Given that PyTorch now by default suggests installing with pip, it's commonly installed without cuda toolkit and nvcc; therefore the official deepspeed wheels which use JIT do not work. I'm guessing there might be a way to disable the 'deepspeed.ops.comm.deepspeed_ccl_comm_op' extension or make it JIT-built to prevent the failure, but no explanation or instruction about that exists online. This can break end-user installations if they install/get deepspeed installed, which then makes transformers models break despite working before deepspeed was installed.

My only suggestion is to either install NVCC and deepspeed wheels, or just uninstall deepspeed, which I chose for my project.

May 12 '25 17:05 rsxdalv

@rsxdalv - can you elaborate a little more on what specific failure you are hitting that brough you to this? You are installing the OneAPI libs and then building on Windows but targeting cuda/nvidia devices (not cpu here)?

The current DeepSpeed on Windows whls we publish are really only for Nvidia/NVCC, we don't guarantee support for a broader set of hardware on Windows. currently. But I want to understand your use case to better understand what is broken.

May 12 '25 19:05 loadams

@rsxdalv - can you elaborate a little more on what specific failure you are hitting that brough you to this? You are installing the OneAPI libs and then building on Windows but targeting cuda/nvidia devices (not cpu here)?

The current DeepSpeed on Windows whls we publish are really only for Nvidia/NVCC, we don't guarantee support for a broader set of hardware on Windows. currently. But I want to understand your use case to better understand what is broken.

Exactly, I want a Nvidia optimized Deepspeed wheel. The currently published wheel requires NVCC, for example in this case:

from transformers import VitsTokenizer

model = VitsModel.from_pretrained( 
    f"facebook/mms-tts-eng",
)

It will fail if the user has deepspeed installed but no nvcc. Thus I am looking to compile a specific deepspeed wheel with ops precompiled (at least the important ones). I have seen ways to bypass many OPS, but not the Intel ones. Trying to go ahead and build the intel ones anyway, there is a roadblock - the oneccl-devel conda package is linux-64 only. So despite using build-win and my best attempts, I am forced to try and build intel ops that requires building my own oneccl-devel for windows which I gave up on.

Looking at this issue and https://github.com/deepspeedai/DeepSpeed/issues/7057 which also fails on oneapi/ccl.hpp the current status is that Windows builds are somewhere between impossible or undocumented.

I think if there was a way to just say BUILD_INTEL_STUFF=0 everyone would be happy, my hopes being that it's just the deepspeed.ops.comm.deepspeed_ccl_comm_op that causes this issue.

May 13 '25 14:05 rsxdalv

Hi @rsxdalv you can override DeepSpeed device detection by set environment variable DS_ACCELERATOR=cuda before install and run DeepSpeed.

Besides can you post output of 'pip list' in your environment? I wonder what python packages are there that makes DeepSpeed select CPU accelerator, thanks!

May 16 '25 03:05 delock

Hi @rsxdalv you can override DeepSpeed device detection by set environment variable DS_ACCELERATOR=cuda before install and run DeepSpeed.

Besides can you post output of 'pip list' in your environment? I wonder what python packages are there that makes DeepSpeed select CPU accelerator, thanks!

Thanks, this time the build worked.

The build environment was basically just:

          python -m pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
          python -m pip install wheel ninja numpy psutil build setuptools==72.2.0

Now, I tried to go to the next step and use DS_BUILD_OPS=1 but I got about 1000 lines of warnings and 2000 lines of errors so I'm going to take a break from it. https://github.com/rsxdalv/DeepSpeed/actions/runs/15073236896/job/42374559991#step:11:4957

May 16 '25 20:05 rsxdalv

one suggestion is not to build ops at installation time and let opbuilder build just in time. In that way only essential ops will be built.

May 17 '25 01:05 delock

one suggestion is not to build ops at installation time and let opbuilder build just in time. In that way only essential ops will be built.

I understand that, and before I was relying on that as I shipped PyTorch with a full CUDA, so it didn't matter. But now that PyTorch uses pip as it's primary distribution mechanism, it's no longer a simple add-on to get NVCC, it's more to the tune of several gigabytes on each end user's machine. As an aside - I'm reducing my reliance on conda because it does not handle updates too well, easily spiking to 8gb RAM usage and taking up to an hour to remove a package. That's why I'm not just adding back the NVCC.

May 17 '25 01:05 rsxdalv

I guess my hope is that if I'm running it smoothly with a JIT install on a fairly standard setup, I should hopefully be able to precompile those functions for everyone.

May 17 '25 01:05 rsxdalv