DeepSpeed
DeepSpeed copied to clipboard
[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS?
While "I get it"...I really don't get why this still doesn't even have BASIC Windows support.
It is published by Microsoft, right?
Compiling from source on windoze doesn't actually seem to generate a .whl file so it could be re-distributed or something.
Pulling from PIP throws any number of errors, from ADAM not being supported because it requires 'lscpu', or just failing because libaio.so can't be found.
Meaning, that for the past several years, this M$-produced piece of software is mostly useless on the OS they create.
This is one of the most annoying things about Python in general. "It's soooo cross-platform". Until you need a specific library, and realize it was really only ever developed for Linux users until someone threw a slug in the readme about how it MIGHT work with windows, but only if you do a hundred backflips while wearing a blue robe and sacrifice a chicken to Cthulhu.
Python does still support releasing different packages for different operating systems, right?
If that's still true, then it would be fantastic if someone out there could release a proper .whl to pypi for us second-class Windoze users. I really don't feel like spending the next several hours trying to upgrade my instance of WSL2 to the right version that won't lose it's mind if I try to use a specific amount of RAM...
I mean, this only has open issues for the past two years or more... #435, #1189, #1631, #1769, #2099, #2191, #2406
+1
DeepSpeed is nearly (if not entirely) impossible to install on Windows.
We hear you. Please try #2428
Hi @n00mkrad and @d8ahazard,
I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza
Hi @n00mkrad and @d8ahazard,
I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza
Nope.
Trying to run it in VS Powershell:
UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.
Trying to run in CMD:
D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
Hi @n00mkrad and @d8ahazard, I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza
Nope.
Trying to run it in VS Powershell:
UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.
Trying to run in CMD:
D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2
Solved this by installing the windows 10 SDK...but this is also precisely what I'm grumbling about.
Even after getting it to compile, there's no /dist folder and no .whl file, despite the setup.py file clearly indicating this is what should happen.
The .bat file is calling python setup.py bdist_whl...yet we get a .egg.info file.
If I edit the bat to call pip install setup.py, it gets really mad at me...can't find the error it throws ATM.
Like, within the app I'm trying to use deepspeed, I can easily do a try: / import deepspeed command to determine if that dependency exists. Why can't the setup.py script do the same for opts that may be unavailable in Windoze?
Last - when I do finally jump through all the hoops and get setup.py to create something in the /build folder, I have to manually spoof the whl-info directory in order for accelerate to recognize this, and even then, it refuses to load due to a missing module.
"Distributed package doesn't have MPI built in. MPI is only included if you build PyTorch from source on a host that has MPI installed."
@tjruwase @RezaYazdaniAminabadi Hi
Can DeepSpeed work without libaio
? if the answer is no there is no way to run DeepSpeed on windows right?
@d8ahazard, yes DeepSpeed can work without libaio
. This library is only used by zero-infinity and zero-inference.
@tjruwase thanks ❤️ if we don't need libaio
why this error LINK : fatal error LNK1181: cannot open input file 'aio.lib'
set DS_BUILD_AIO=0
set DS_BUILD_SPARSE_ATTN=0
Did Microsoft really consider adapting to windows when developing it? When I start pytorch, it forces linking a GPU with nccl even though I train under cpu only
As we all know, nccl cannot be used on win fucking at all
working with WSL 🎉
- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2
working with WSL 🎉
- Windows 11 22H2 - Ubuntu 22.04 - Linux PC 5.15.68.1-microsoft-standard-WSL2
How did you resolve the libaio
link error?
working with WSL 🎉
- Windows 11 22H2 - Ubuntu 22.04 - Linux PC 5.15.68.1-microsoft-standard-WSL2
So it's still not working on Windows.
WSL is not always an option depending on the use case.
@tjruwase I can't manage to run on native windows. 😭 and ubuntu already comes with libaio
and this issue helped a lot
https://github.com/huggingface/diffusers/issues/807
@camenduru, can you share the log of the link error? Thanks!
@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3 errors coming from:
https://github.com/pytorch/pytorch/pull/81642 (this one looks serious) 🥵 https://github.com/pytorch/pytorch/blob/v1.12.1/c10/util/safe_numerics.h
const char *cusparseGetErrorString(cusparseStatus_t status);
https://github.com/pytorch/pytorch/blob/v1.12.1/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp
is this one necessary? [WARNING] please install triton==1.0.0 if you want to use sparse attention (Supported Platforms: Linux) https://github.com/openai/triton/
error C3861: '_addcarry_u64': identifier not found
this one is very interesting it is in the list 🤷
@camenduru for wsl2, is it passing the pytest-3 tests/unit and other tests? I got it compiled on wsl2 but it is failing almost every test due to nccl issues.
If you could provide details as to your installation and whether you are passing the unit tests would be appreciated.
@Thomas-MMJ DeepSpeed very slow with wsl2 and I deleted everything sorry I can't help 😞 we need working DeepSpeed on native windows maybe 1 year later idk also why we are putting linux kvm between gpu and cpu we will lose ~5% right?
@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3 errors coming from:
I think the problem is that it is trying to build all the ops because of the following environment variable setting
Can you try setting that env var to zero?
have you tried using Chat GPT3 to solve it? 1 of the other requirements is Triton and a Russian managed to build a working 2.0 version for Windows a couple days ago but Chat GPT could likely find the other holes keeping it from building properly
well if anyone feels like tinkering around with this, here's a whl that installs deepspeed version 0.8.0 on windows https://transfer.sh/eDLOMJ/deepspeed-0.8.0+cd271a4a-cp310-cp310-win_amd64.whl requires the cracked triton 2.0.0 whl first and the files from its folder dropped into the triton folder in xformers before it will install but it installs... heres the triton whl https://transfer.sh/me0xpC/triton-2.0.0-cp310-cp310-win_amd64.whl
It'll throw up c10d flags looking for NCCL which is Linux only when turned on but this is an issue with either accelerate or my computer bc I get the same error when trying to turn on any sort of distributed training at all in windows and I don't know if I possess the coding knowledge to fix it so I leave it up to y'all
Oh and it'll error out during accelerate config after saying no to using a deepspeed json file you'd like to use but I got around this by replacing the accelerate config file in windows with a config file I made in WSL
I must point out that those wheel links redirect to Not Found
Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?
Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?
Not without compiling it yourself, sacrificing three chickens to the dark lord Cthulhu, and playing "Hit me baby one more time" on reverse.
Oh no 😐 I was playing the wrong song.
So, on windows 10, when I do:
pip install deepspeed
Collecting deepspeed
Using cached deepspeed-0.8.3.tar.gz (765 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [16 lines of output]
test.c
LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib"
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
abort(f"Unable to pre-compile {op_name}")
File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
assert False, msg
AssertionError: Unable to pre-compile async_io
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] One can disable async_io with DS_BUILD_AIO=0
[ERROR] Unable to pre-compile async_io
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:
pip install deepspeed
Collecting deepspeed
Using cached deepspeed-0.8.3.tar.gz (765 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [31 lines of output]
test.c
LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib"
ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«.
ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«.
ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«.
ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«.
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
abort(f"Unable to pre-compile {op_name}")
File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort
assert False, msg
AssertionError: Unable to pre-compile sparse_attn
[WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
DS_BUILD_OPS=1
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist!
[WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
[WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist!
[WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
[WARNING] cpu_adam requires the 'lscpu' command, but it does not exist!
[WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
[WARNING] cpu_adam requires the 'lscpu' command, but it does not exist!
[WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
[WARNING] sparse_attn cuda is not available from torch
[WARNING] sparse_attn requires a torch version >= 1.5 but detected 2.0
[WARNING] please install triton==1.0.0 if you want to use sparse attention
[WARNING] One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
[ERROR] Unable to pre-compile sparse_attn
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
So, on windows 10, when I do:
pip install deepspeed Collecting deepspeed Using cached deepspeed-0.8.3.tar.gz (765 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [16 lines of output] test.c LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib" Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module> abort(f"Unable to pre-compile {op_name}") File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort assert False, msg AssertionError: Unable to pre-compile async_io [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2) DS_BUILD_OPS=1 [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] One can disable async_io with DS_BUILD_AIO=0 [ERROR] Unable to pre-compile async_io [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details.
When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:
pip install deepspeed Collecting deepspeed Using cached deepspeed-0.8.3.tar.gz (765 kB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [31 lines of output] test.c LINK : fatal error LNK1181: Ґ г¤ Ґвбп ®вЄалвм ўе®¤®© д ©« "aio.lib" ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«. ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«. ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«. ЌҐ г¤ Ґвбп ©вЁ гЄ § л© д ©«. Traceback (most recent call last): File "<string>", line 2, in <module> File "<pip-setuptools-caller>", line 34, in <module> File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module> abort(f"Unable to pre-compile {op_name}") File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort assert False, msg AssertionError: Unable to pre-compile sparse_attn [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2) DS_BUILD_OPS=1 [WARNING] async_io requires the dev libaio .so object and headers but these were not found. [WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi on. [WARNING] cpu_adagrad requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi on. [WARNING] cpu_adam requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] cpu_adam requires the 'lscpu' command, but it does not exist! [WARNING] cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution. [WARNING] sparse_attn cuda is not available from torch [WARNING] sparse_attn requires a torch version >= 1.5 but detected 2.0 [WARNING] please install triton==1.0.0 if you want to use sparse attention [WARNING] One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0 [ERROR] Unable to pre-compile sparse_attn [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed × Encountered error while generating package metadata. ╰─> See above for output. note: This is an issue with the package mentioned above, not pip. hint: See above for details.
Same problem,seems no way to solve the problem,but it works fine on linux...