DeepSpeed
DeepSpeed copied to clipboard
[BUG] oneapi/ccl.hpp: No such file or directory.
Describe the bug
The builds on conda-forge have been failing since deepspeed=0.14.1 for CUDA 11.8 and 12.0 with an error like fatal error: oneapi/ccl.hpp: No such file or directory. Originally reported at https://github.com/conda-forge/deepspeed-feedstock/pull/56#issuecomment-2062611899.
To Reproduce Steps to reproduce the behavior:
- Go to https://github.com/conda-forge/deepspeed-feedstock/pull/57 and clone the branch
- Run
python build_locally.pylocally, select the option with CUDA 11.8 and Python 3.9 - See error below
Expected behavior A clear and concise description of what you expected to happen.
CUDA builds work as expected.
ds_report output
Please run ds_report to give us details about your setup.
Note, this isn't the exact report for the conda-forge CI device, I copied this from the CPU build logs
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [FAIL]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
deepspeed_not_implemented [NO] ....... [OKAY]
deepspeed_ccl_comm ..... [NO] ....... [OKAY]
deepspeed_shm_comm ..... [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['$PREFIX/lib/python3.9/site-packages/torch']
torch version .................... 2.3.0.post101
deepspeed install path ........... ['$PREFIX/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.14.3, f492cfc, HEAD
deepspeed wheel compiled w. ...... torch 0.0
shared memory (/dev/shm) size .... 64.00 MB
Screenshots
Truncated traceback from https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=953875&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=350df31b-3291-5209-0bb7-031395f0baa1&l=3486:
2024-06-12T22:49:04.3043574Z building 'deepspeed.ops.comm.deepspeed_ccl_comm_op' extension
2024-06-12T22:49:04.3043994Z creating build/temp.linux-x86_64-cpython-39
2024-06-12T22:49:04.3053293Z creating build/temp.linux-x86_64-cpython-39/csrc
2024-06-12T22:49:04.3054113Z creating build/temp.linux-x86_64-cpython-39/csrc/cpu
2024-06-12T22:49:04.3054729Z creating build/temp.linux-x86_64-cpython-39/csrc/cpu/comm
2024-06-12T22:49:04.3071806Z /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_build_env/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fPIC -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work=/usr/local/src/conda/deepspeed-0.14.3 -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p=/usr/local/src/conda-prefix -isystem /usr/local/cuda/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include -isystem /usr/local/cuda/include -fPIC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/csrc/cpu/includes -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/TH -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.9/site-packages/torch/include/THC -I/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/include/python3.9 -c csrc/cpu/comm/ccl.cpp -o build/temp.linux-x86_64-cpython-39/csrc/cpu/comm/ccl.o -O2 -fopenmp -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -DTORCH_EXTENSION_NAME=deepspeed_ccl_comm_op -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2024-06-12T22:49:08.2062484Z csrc/cpu/comm/ccl.cpp:8:10: fatal error: oneapi/ccl.hpp: No such file or directory
2024-06-12T22:49:08.2067800Z 8 | #include <oneapi/ccl.hpp>
2024-06-12T22:49:08.2068222Z | ^~~~~~~~~~~~~~~~
2024-06-12T22:49:08.2068507Z compilation terminated.
2024-06-12T22:49:08.2182741Z error: command '/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_build_env/bin/x86_64-conda-linux-gnu-cc' failed with exit code 1
2024-06-12T22:49:08.6174937Z error: subprocess-exited-with-error
2024-06-12T22:49:08.6176666Z
2024-06-12T22:49:08.6188012Z × python setup.py bdist_wheel did not run successfully.
2024-06-12T22:49:08.6227487Z │ exit code: 1
2024-06-12T22:49:08.6240717Z ╰─> See above for output.
2024-06-12T22:49:08.6252920Z
2024-06-12T22:49:08.6264017Z note: This error originates from a subprocess, and is likely not a problem with pip.
2024-06-12T22:49:08.6271330Z full command: /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/bin/python -u -c '
2024-06-12T22:49:08.6272043Z exec(compile('"'"''"'"''"'"'
2024-06-12T22:49:08.6277838Z # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
2024-06-12T22:49:08.6283726Z #
2024-06-12T22:49:08.6284428Z # - It imports setuptools before invoking setup.py, to enable projects that directly
2024-06-12T22:49:08.6289287Z # import from `distutils.core` to work with newer packaging standards.
2024-06-12T22:49:08.6289949Z # - It provides a clear error message when setuptools is not installed.
2024-06-12T22:49:08.6295383Z # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
2024-06-12T22:49:08.6295837Z # setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
2024-06-12T22:49:08.6301077Z # manifest_maker: standard file '"'"'-c'"'"' not found".
2024-06-12T22:49:08.6307069Z # - It generates a shim setup.py, for handling setup.cfg-only projects.
2024-06-12T22:49:08.6307810Z import os, sys, tokenize
2024-06-12T22:49:08.6314125Z
2024-06-12T22:49:08.6314907Z try:
2024-06-12T22:49:08.6320956Z import setuptools
2024-06-12T22:49:08.6321316Z except ImportError as error:
2024-06-12T22:49:08.6325049Z print(
2024-06-12T22:49:08.6326023Z "ERROR: Can not execute `setup.py` since setuptools is not available in "
2024-06-12T22:49:08.6335095Z "the build environment.",
2024-06-12T22:49:08.6335348Z file=sys.stderr,
2024-06-12T22:49:08.6338543Z )
2024-06-12T22:49:08.6338832Z sys.exit(1)
2024-06-12T22:49:08.6339045Z
2024-06-12T22:49:08.6339554Z __file__ = %r
2024-06-12T22:49:08.6340070Z sys.argv[0] = __file__
2024-06-12T22:49:08.6340336Z
2024-06-12T22:49:08.6340562Z if os.path.exists(__file__):
2024-06-12T22:49:08.6340835Z filename = __file__
2024-06-12T22:49:08.6341059Z with tokenize.open(__file__) as f:
2024-06-12T22:49:08.6341411Z setup_py_code = f.read()
2024-06-12T22:49:08.6341621Z else:
2024-06-12T22:49:08.6341993Z filename = "<auto-generated setuptools caller>"
2024-06-12T22:49:08.6342280Z setup_py_code = "from setuptools import setup; setup()"
2024-06-12T22:49:08.6342576Z
2024-06-12T22:49:08.6342895Z exec(compile(setup_py_code, filename, "exec"))
2024-06-12T22:49:08.6343569Z '"'"''"'"''"'"' % ('"'"'/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-v4pibtb1
2024-06-12T22:49:08.6344021Z cwd: /home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/
2024-06-12T22:49:08.6344416Z Building wheel for deepspeed (setup.py): finished with status 'error'
2024-06-12T22:49:08.6348857Z ERROR: Failed building wheel for deepspeed
2024-06-12T22:49:08.6349126Z Running setup.py clean for deepspeed
2024-06-12T22:49:08.6349635Z Running command python setup.py clean
2024-06-12T22:49:10.9424015Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2024-06-12T22:49:10.9498275Z [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
2024-06-12T22:49:10.9509931Z DS_BUILD_OPS=1
2024-06-12T22:49:17.9894660Z Install Ops={'deepspeed_not_implemented': 1, 'deepspeed_ccl_comm': 1, 'deepspeed_shm_comm': 1, 'cpu_adam': 1, 'fused_adam': 1}
2024-06-12T22:49:18.0269777Z version=0.14.3, git_hash=f492cfc, git_branch=HEAD
2024-06-12T22:49:18.0270870Z install_requires=['hjson', 'ninja', 'numpy', 'nvidia-ml-py', 'packaging>=20.0', 'psutil', 'py-cpuinfo', 'pydantic', 'torch', 'tqdm']
2024-06-12T22:49:18.0278897Z ext_modules=[<setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_not_implemented_op') at 0x7ff248dbe460>, <setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_ccl_comm_op') at 0x7ff248dbe4c0>, <setuptools.extension.Extension('deepspeed.ops.comm.deepspeed_shm_comm_op') at 0x7ff248dbe520>, <setuptools.extension.Extension('deepspeed.ops.adam.cpu_adam_op') at 0x7ff16dd51b80>, <setuptools.extension.Extension('deepspeed.ops.adam.fused_adam_op') at 0x7ff16dd51d90>]
2024-06-12T22:49:18.0651351Z running clean
2024-06-12T22:49:18.0714575Z removing 'build/temp.linux-x86_64-cpython-39' (and everything under it)
2024-06-12T22:49:18.0715596Z removing 'build/lib.linux-x86_64-cpython-39' (and everything under it)
2024-06-12T22:49:18.1133919Z 'build/bdist.linux-x86_64' does not exist -- can't clean it
2024-06-12T22:49:18.1143315Z 'build/scripts-3.9' does not exist -- can't clean it
2024-06-12T22:49:18.1151899Z removing 'build'
2024-06-12T22:49:18.1171105Z deepspeed build time = 0.08735942840576172 secs
2024-06-12T22:49:18.5956605Z Failed to build deepspeed
2024-06-12T22:49:18.5973299Z ERROR: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
2024-06-12T22:49:18.5973660Z Exception information:
2024-06-12T22:49:18.5986009Z Traceback (most recent call last):
2024-06-12T22:49:18.5993394Z File "$PREFIX/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
2024-06-12T22:49:18.5993851Z status = run_func(*args)
2024-06-12T22:49:18.5994316Z File "$PREFIX/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
2024-06-12T22:49:18.5995399Z return func(self, options, args)
2024-06-12T22:49:18.5999462Z File "$PREFIX/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 429, in run
2024-06-12T22:49:18.5999708Z raise InstallationError(
2024-06-12T22:49:18.6006202Z pip._internal.exceptions.InstallationError: Could not build wheels for deepspeed, which is required to install pyproject.toml-based projects
2024-06-12T22:49:18.6010801Z Removed build tracker: '/tmp/pip-build-tracker-gvbib0oo'
2024-06-12T22:49:20.4656669Z Traceback (most recent call last):
2024-06-12T22:49:20.4664375Z File "/opt/conda/bin/conda-build", line 11, in <module>
2024-06-12T22:49:20.4669893Z sys.exit(execute())
2024-06-12T22:49:20.4670437Z File "/opt/conda/lib/python3.10/site-packages/conda_build/cli/main_build.py", line 590, in execute
2024-06-12T22:49:20.4677725Z api.build(
2024-06-12T22:49:20.4678886Z File "/opt/conda/lib/python3.10/site-packages/conda_build/api.py", line 250, in build
2024-06-12T22:49:20.4685860Z return build_tree(
2024-06-12T22:49:20.4691479Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 3638, in build_tree
2024-06-12T22:49:20.4708481Z packages_from_this = build(
2024-06-12T22:49:20.4713969Z File "/opt/conda/lib/python3.10/site-packages/conda_build/build.py", line 2506, in build
2024-06-12T22:49:20.4714313Z utils.check_call_env(
2024-06-12T22:49:20.4724506Z File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 405, in check_call_env
2024-06-12T22:49:20.4729616Z return _func_defaulting_env_to_os_environ("call", *popenargs, **kwargs)
2024-06-12T22:49:20.4730205Z File "/opt/conda/lib/python3.10/site-packages/conda_build/utils.py", line 381, in _func_defaulting_env_to_os_environ
2024-06-12T22:49:20.4735612Z raise subprocess.CalledProcessError(proc.returncode, _args)
2024-06-12T22:49:20.4736400Z subprocess.CalledProcessError: Command '['/bin/bash', '-o', 'errexit', '/home/conda/feedstock_root/build_artifacts/deepspeed_1718232254780/work/conda_build.sh']' returned non-zero exit status 1.
2024-06-12T22:49:30.4784588Z
2024-06-12T22:49:30.5793301Z ##[error]Bash exited with code '1'.
2024-06-12T22:49:30.5974127Z ##[section]Finishing: Run docker build
System info (please complete the following information):
- OS: Ubuntu 22.04.4
- GPU count and types [e.g. two machines with x8 A100s each]: 1 NVIDIA GPU
- Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB]: N/A
- Python version: 3.9
- Any other relevant info about your setup: None
Launcher context
Are you launching your experiment with the deepspeed launcher, MPI, or something else? No
Docker context Are you using a specific docker image that you can share?
quay.io/condaforge/linux-anvil-cuda:11.8
Additional context Add any other context about the problem here.
The builds have been failing in these PRs as well:
- deepspeed 0.14.2 - https://github.com/conda-forge/deepspeed-feedstock/pull/57#issuecomment-2078635322
- deepspeed 0.14.3 - https://github.com/conda-forge/deepspeed-feedstock/pull/62
Thanks @weiji14 for opening this to track.
Hello, any update on this issue?
Follow the instructions here to install oneccl-devel from Intel as:
conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel
Solved this problem.
Thanks so much @SnzFor16Min for pointing me to that oneccl-devel package (which is also on the conda-forge channel at https://anaconda.org/conda-forge/oneccl-devel/files). As a temporary workaround, I've managed to build deepspeed=0.14.4 at https://github.com/conda-forge/deepspeed-feedstock/pull/63 by adding oneccl-devel to the host dependencies.
That said, I'm still unsure if this issue should be closed, because this Intel oneAPI Toolkit should only be used for CPU builds and not CUDA (GPU) builds no? As mentioned at https://github.com/conda-forge/deepspeed-feedstock/pull/56#issuecomment-2065192465:
Do we need to get that
oneapi/ccl.hppfile from somewhere? Don't quite get it since these are CUDA (GPU) builds, not CPU builds.@weiji14 - It looks like this comes from the Intel extensions for pytorch, but we shouldn't need that, and some DeepSpeed tests should have caught that. I'll take a look soon to see if I can tell why we are hitting this here.
Will leave this up to @loadams and the deepspeed team to resolve.
I'm no expert in building DeepSpeed, but as I see DS_BUILD_OPS=1 in the traceback, perhaps @weiji14 you should check if the building script was also pre-compiling CPU ops (e.g., DS_BUILD_CPU_ADAM). This is mentioned in the DeepSpeed documentation, which might require oneAPI libraries even if it's a GPU build.
Ah yes, the DS_BUILD_OPS=1 flag is set at https://github.com/conda-forge/deepspeed-feedstock/blame/b0193a708c3f1f6864e2a85f7cbdf92ee3bf39ff/recipe/build.sh#L4-L6, and DS_BUILD_CPU_ADAM might be enabled as a result (I haven't checked those build flags for almost a year). Maybe it doesn't hurt to compile with the CPU ops enabled even on CUDA?
@weiji14 - this should be fine to add to the dependencies, it should not cause any issues on the CUDA builds.
Also it should be fine to leave DS_BUILD_OPS=1, That should have all ops enabled, including the CPU ops.
I'd say lets leave this open for now, and I'll check back to confirm we have no issues reported from users, and we can also confirm the flow works with the next DeepSpeed release.
C:\ComfyUI\ComfyUI\custom_nodes\EasyAnimate>conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel Channels:
- https://software.repos.intel.com/python/conda
- conda-forge
- defaults Platform: win-64 Collecting package metadata (repodata.json): done Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
“Is this package no longer installable?”
@delock, can you help with this?
@whois206 - I had no problems running the command that you listed:
(base) test@deepspeed:~$ conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel
Channels:
- https://software.repos.intel.com/python/conda
- conda-forge
- defaults
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /home/deepspeed/miniconda3
added / updated specs:
- oneccl-devel
The following packages will be downloaded:
package | build
---------------------------|-----------------
_libgcc_mutex-0.1 | conda_forge 3 KB conda-forge
_openmp_mutex-4.5 | 2_gnu 23 KB conda-forge
certifi-2024.8.30 | pyhd8ed1ab_0 160 KB conda-forge
impi_rt-2021.14.0 | intel_790 52.6 MB https://software.repos.intel.com/python/conda
intel-cmplr-lib-rt-2025.0.0| intel_1169 38.3 MB https://software.repos.intel.com/python/conda
intel-cmplr-lib-ur-2025.0.0| intel_1169 5.6 MB https://software.repos.intel.com/python/conda
intel-cmplr-lic-rt-2025.0.0| intel_1169 20 KB https://software.repos.intel.com/python/conda
intel-sycl-rt-2025.0.0 | intel_1169 5.1 MB https://software.repos.intel.com/python/conda
I do notice one thing in your logs is that your platform is win64 - do you have a Windows node? And if so, could you let us know what accelerator you are trying to use?
I read the initial description. From ds_report it looks like @weiji14 is reporting a non-CUDA system (the ops in ds_report) indicates CPU accelerator are selected. Is this intended?
If running on CPU is intended, this is the guide for install dependencies for Intel CPU. https://github.com/microsoft/DeepSpeed/blob/master/docs/_tutorials/accelerator-setup-guide.md#installation-steps-for-intel-architecture-cpu
If you only run DeepSpeed on single machine, the steps above should be enough. I'm wondering why the system wants to build deepspeed_ccl_comm with oneCCL, this is usually optional. And by default, DeepSpeed only attempts to build deepspeed_ccl_comm when oneccl-binding-pt is installed.
For @whois206 's question, I need to check whether there are oneccl-devel package for Windows. Will answer later.
@whois206 oneccl-devel package does not have Windows support. I checked oneCCL product page https://www.intel.com/content/www/us/en/developer/tools/oneapi/oneccl-download.html, there seems no Windows system download package.
@loadams this is completely broken, here's a full failing build pipeline which both fails to install the oneccl-devel package and then tries to run build_win to fail because of csrc/cpu/comm/ccl.cpp(8): fatal error C1083: Cannot open include file: 'oneapi/ccl.hpp': No such file or directory:
https://github.com/rsxdalv/DeepSpeed/actions/runs/14977741000/job/42074217352#step:10:222
Given that PyTorch now by default suggests installing with pip, it's commonly installed without cuda toolkit and nvcc; therefore the official deepspeed wheels which use JIT do not work. I'm guessing there might be a way to disable the 'deepspeed.ops.comm.deepspeed_ccl_comm_op' extension or make it JIT-built to prevent the failure, but no explanation or instruction about that exists online. This can break end-user installations if they install/get deepspeed installed, which then makes transformers models break despite working before deepspeed was installed.
My only suggestion is to either install NVCC and deepspeed wheels, or just uninstall deepspeed, which I chose for my project.
@rsxdalv - can you elaborate a little more on what specific failure you are hitting that brough you to this? You are installing the OneAPI libs and then building on Windows but targeting cuda/nvidia devices (not cpu here)?
The current DeepSpeed on Windows whls we publish are really only for Nvidia/NVCC, we don't guarantee support for a broader set of hardware on Windows. currently. But I want to understand your use case to better understand what is broken.
@rsxdalv - can you elaborate a little more on what specific failure you are hitting that brough you to this? You are installing the OneAPI libs and then building on Windows but targeting cuda/nvidia devices (not cpu here)?
The current DeepSpeed on Windows whls we publish are really only for Nvidia/NVCC, we don't guarantee support for a broader set of hardware on Windows. currently. But I want to understand your use case to better understand what is broken.
Exactly, I want a Nvidia optimized Deepspeed wheel. The currently published wheel requires NVCC, for example in this case:
from transformers import VitsTokenizer
model = VitsModel.from_pretrained(
f"facebook/mms-tts-eng",
)
It will fail if the user has deepspeed installed but no nvcc. Thus I am looking to compile a specific deepspeed wheel with ops precompiled (at least the important ones). I have seen ways to bypass many OPS, but not the Intel ones. Trying to go ahead and build the intel ones anyway, there is a roadblock - the oneccl-devel conda package is linux-64 only. So despite using build-win and my best attempts, I am forced to try and build intel ops that requires building my own oneccl-devel for windows which I gave up on.
Looking at this issue and https://github.com/deepspeedai/DeepSpeed/issues/7057 which also fails on oneapi/ccl.hpp the current status is that Windows builds are somewhere between impossible or undocumented.
I think if there was a way to just say
BUILD_INTEL_STUFF=0
everyone would be happy, my hopes being that it's just the deepspeed.ops.comm.deepspeed_ccl_comm_op that causes this issue.
Hi @rsxdalv you can override DeepSpeed device detection by set environment variable DS_ACCELERATOR=cuda before install and run DeepSpeed.
Besides can you post output of 'pip list' in your environment? I wonder what python packages are there that makes DeepSpeed select CPU accelerator, thanks!
Hi @rsxdalv you can override DeepSpeed device detection by set environment variable
DS_ACCELERATOR=cudabefore install and run DeepSpeed.Besides can you post output of 'pip list' in your environment? I wonder what python packages are there that makes DeepSpeed select CPU accelerator, thanks!
Thanks, this time the build worked.
The build environment was basically just:
python -m pip install torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124
python -m pip install wheel ninja numpy psutil build setuptools==72.2.0
Now, I tried to go to the next step and use DS_BUILD_OPS=1 but I got about 1000 lines of warnings and 2000 lines of errors so I'm going to take a break from it. https://github.com/rsxdalv/DeepSpeed/actions/runs/15073236896/job/42374559991#step:11:4957
one suggestion is not to build ops at installation time and let opbuilder build just in time. In that way only essential ops will be built.
one suggestion is not to build ops at installation time and let opbuilder build just in time. In that way only essential ops will be built.
I understand that, and before I was relying on that as I shipped PyTorch with a full CUDA, so it didn't matter. But now that PyTorch uses pip as it's primary distribution mechanism, it's no longer a simple add-on to get NVCC, it's more to the tune of several gigabytes on each end user's machine. As an aside - I'm reducing my reliance on conda because it does not handle updates too well, easily spiking to 8gb RAM usage and taking up to an hour to remove a package. That's why I'm not just adding back the NVCC.
I guess my hope is that if I'm running it smoothly with a JIT install on a fairly standard setup, I should hopefully be able to precompile those functions for everyone.