DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] Hey, Microsoft...Could you PLEASE Support Your Own OS?

Open d8ahazard opened this issue 2 years ago • 33 comments

While "I get it"...I really don't get why this still doesn't even have BASIC Windows support.

It is published by Microsoft, right?

Compiling from source on windoze doesn't actually seem to generate a .whl file so it could be re-distributed or something.

Pulling from PIP throws any number of errors, from ADAM not being supported because it requires 'lscpu', or just failing because libaio.so can't be found.

Meaning, that for the past several years, this M$-produced piece of software is mostly useless on the OS they create.

This is one of the most annoying things about Python in general. "It's soooo cross-platform". Until you need a specific library, and realize it was really only ever developed for Linux users until someone threw a slug in the readme about how it MIGHT work with windows, but only if you do a hundred backflips while wearing a blue robe and sacrifice a chicken to Cthulhu.

Python does still support releasing different packages for different operating systems, right?

If that's still true, then it would be fantastic if someone out there could release a proper .whl to pypi for us second-class Windoze users. I really don't feel like spending the next several hours trying to upgrade my instance of WSL2 to the right version that won't lose it's mind if I try to use a specific amount of RAM...

d8ahazard avatar Oct 15 '22 14:10 d8ahazard

I mean, this only has open issues for the past two years or more... #435, #1189, #1631, #1769, #2099, #2191, #2406

d8ahazard avatar Oct 15 '22 14:10 d8ahazard

+1

DeepSpeed is nearly (if not entirely) impossible to install on Windows.

n00mkrad avatar Oct 16 '22 12:10 n00mkrad

We hear you. Please try #2428

tjruwase avatar Oct 16 '22 19:10 tjruwase

Hi @n00mkrad and @d8ahazard,

I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

RezaYazdaniAminabadi avatar Oct 20 '22 04:10 RezaYazdaniAminabadi

Hi @n00mkrad and @d8ahazard,

I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

Nope.

Trying to run it in VS Powershell:

UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Trying to run in CMD:

D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

n00mkrad avatar Oct 20 '22 12:10 n00mkrad

Hi @n00mkrad and @d8ahazard, I wonder if you have any update on whether this PR solved the Windows installation issue? Thanks, Reza

Nope.

Trying to run it in VS Powershell:

UserWarning: It seems that the VC environment is activated but DISTUTILS_USE_SDK is not set.This may lead to multiple activations of the VC env.Please set `DISTUTILS_USE_SDK=1` and try again.

Trying to run in CMD:

D:\Temp\Setup\DeepSpeed-eltonz-fix-win-build\csrc\includes\StopWatch.h(3): fatal error C1083: Cannot open include file: 'windows.h': No such file or directory
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

Solved this by installing the windows 10 SDK...but this is also precisely what I'm grumbling about.

Even after getting it to compile, there's no /dist folder and no .whl file, despite the setup.py file clearly indicating this is what should happen.

The .bat file is calling python setup.py bdist_whl...yet we get a .egg.info file.

If I edit the bat to call pip install setup.py, it gets really mad at me...can't find the error it throws ATM.

Like, within the app I'm trying to use deepspeed, I can easily do a try: / import deepspeed command to determine if that dependency exists. Why can't the setup.py script do the same for opts that may be unavailable in Windoze?

Last - when I do finally jump through all the hoops and get setup.py to create something in the /build folder, I have to manually spoof the whl-info directory in order for accelerate to recognize this, and even then, it refuses to load due to a missing module.

"Distributed package doesn't have MPI built in. MPI is only included if you build PyTorch from source on a host that has MPI installed."

d8ahazard avatar Oct 20 '22 13:10 d8ahazard

@tjruwase @RezaYazdaniAminabadi Hi Can DeepSpeed work without libaio? if the answer is no there is no way to run DeepSpeed on windows right?

camenduru avatar Oct 23 '22 16:10 camenduru

@d8ahazard, yes DeepSpeed can work without libaio. This library is only used by zero-infinity and zero-inference.

tjruwase avatar Oct 23 '22 18:10 tjruwase

@tjruwase thanks ❤️ if we don't need libaio why this error LINK : fatal error LNK1181: cannot open input file 'aio.lib' set DS_BUILD_AIO=0 set DS_BUILD_SPARSE_ATTN=0

camenduru avatar Oct 23 '22 19:10 camenduru

Did Microsoft really consider adapting to windows when developing it? When I start pytorch, it forces linking a GPU with nccl even though I train under cpu only

As we all know, nccl cannot be used on win fucking at all

ChenYFan avatar Oct 24 '22 08:10 ChenYFan

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

camenduru avatar Oct 24 '22 15:10 camenduru

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

How did you resolve the libaio link error?

tjruwase avatar Oct 24 '22 15:10 tjruwase

working with WSL 🎉

- Windows 11 22H2
- Ubuntu 22.04
- Linux PC 5.15.68.1-microsoft-standard-WSL2

So it's still not working on Windows.

WSL is not always an option depending on the use case.

n00mkrad avatar Oct 24 '22 15:10 n00mkrad

@tjruwase I can't manage to run on native windows. 😭 and ubuntu already comes with libaio and this issue helped a lot https://github.com/huggingface/diffusers/issues/807

camenduru avatar Oct 24 '22 15:10 camenduru

@camenduru, can you share the log of the link error? Thanks!

tjruwase avatar Oct 24 '22 16:10 tjruwase

@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3 errors coming from:

https://github.com/pytorch/pytorch/pull/81642 (this one looks serious) 🥵 https://github.com/pytorch/pytorch/blob/v1.12.1/c10/util/safe_numerics.h

const char *cusparseGetErrorString(cusparseStatus_t status); https://github.com/pytorch/pytorch/blob/v1.12.1/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cpp

is this one necessary? [WARNING] please install triton==1.0.0 if you want to use sparse attention (Supported Platforms: Linux) https://github.com/openai/triton/

camenduru avatar Oct 24 '22 21:10 camenduru

error C3861: '_addcarry_u64': identifier not found this one is very interesting it is in the list 🤷

camenduru avatar Oct 24 '22 21:10 camenduru

@camenduru for wsl2, is it passing the pytest-3 tests/unit and other tests? I got it compiled on wsl2 but it is failing almost every test due to nccl issues.

If you could provide details as to your installation and whether you are passing the unit tests would be appreciated.

Thomas-MMJ avatar Oct 31 '22 20:10 Thomas-MMJ

@Thomas-MMJ DeepSpeed very slow with wsl2 and I deleted everything sorry I can't help 😞 we need working DeepSpeed on native windows maybe 1 year later idk also why we are putting linux kvm between gpu and cpu we will lose ~5% right?

camenduru avatar Nov 01 '22 05:11 camenduru

@tjruwase https://gist.github.com/camenduru/c9a2d97f229b389fed0b1ad561a243d3 errors coming from:

I think the problem is that it is trying to build all the ops because of the following environment variable setting image

Can you try setting that env var to zero?

tjruwase avatar Nov 01 '22 21:11 tjruwase

have you tried using Chat GPT3 to solve it? 1 of the other requirements is Triton and a Russian managed to build a working 2.0 version for Windows a couple days ago but Chat GPT could likely find the other holes keeping it from building properly

PleezDeez avatar Jan 12 '23 16:01 PleezDeez

well if anyone feels like tinkering around with this, here's a whl that installs deepspeed version 0.8.0 on windows https://transfer.sh/eDLOMJ/deepspeed-0.8.0+cd271a4a-cp310-cp310-win_amd64.whl requires the cracked triton 2.0.0 whl first and the files from its folder dropped into the triton folder in xformers before it will install but it installs... heres the triton whl https://transfer.sh/me0xpC/triton-2.0.0-cp310-cp310-win_amd64.whl

PleezDeez avatar Jan 15 '23 22:01 PleezDeez

It'll throw up c10d flags looking for NCCL which is Linux only when turned on but this is an issue with either accelerate or my computer bc I get the same error when trying to turn on any sort of distributed training at all in windows and I don't know if I possess the coding knowledge to fix it so I leave it up to y'all

PleezDeez avatar Jan 16 '23 00:01 PleezDeez

Oh and it'll error out during accelerate config after saying no to using a deepspeed json file you'd like to use but I got around this by replacing the accelerate config file in windows with a config file I made in WSL

PleezDeez avatar Jan 16 '23 00:01 PleezDeez

I must point out that those wheel links redirect to Not Found

78Alpha avatar Feb 01 '23 03:02 78Alpha

Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?

JeffMII avatar Mar 14 '23 21:03 JeffMII

Wait, so DeepSpeed is a Microsoft project, and it can't be used on Windows?

Not without compiling it yourself, sacrificing three chickens to the dark lord Cthulhu, and playing "Hit me baby one more time" on reverse.

d8ahazard avatar Mar 15 '23 12:03 d8ahazard

Oh no 😐 I was playing the wrong song.

camenduru avatar Mar 15 '23 13:03 camenduru

So, on windows 10, when I do:

pip install deepspeed                                                                               
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort    
          assert False, msg
      AssertionError: Unable to pre-compile sparse_attn
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  sparse_attn cuda is not available from torch
       [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
       [WARNING]  please install triton==1.0.0 if you want to use sparse attention
       [WARNING]  One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
       [ERROR]  Unable to pre-compile sparse_attn
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

yadalik avatar Mar 31 '23 16:03 yadalik

So, on windows 10, when I do:

pip install deepspeed                                                                               
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [16 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-97anxpmj\deepspeed_629338d4deb54654aba44efd0bf8dab4\setup.py", line 48, in abort
          assert False, msg
      AssertionError: Unable to pre-compile async_io
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  One can disable async_io with DS_BUILD_AIO=0
       [ERROR]  Unable to pre-compile async_io
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

When I setup DS_BUILD_AIO=0, getting bunch of lscpu command is not available, I suppose for now it not getting any better with DS_BUILD_SPARSE_ATTN=0?:

pip install deepspeed
Collecting deepspeed
  Using cached deepspeed-0.8.3.tar.gz (765 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [31 lines of output]
      test.c
      LINK : fatal error LNK1181: ­Ґ г¤ Ґвбп ®вЄалвм ўе®¤­®© д ©« "aio.lib"
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      ЌҐ г¤ Ґвбп ­ ©вЁ гЄ § ­­л© д ©«.
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 156, in <module>
          abort(f"Unable to pre-compile {op_name}")
        File "C:\Users\i\AppData\Local\Temp\pip-install-a7n_s6ma\deepspeed_e6ef7efe0142466088802e0aca58350e\setup.py", line 48, in abort    
          assert False, msg
      AssertionError: Unable to pre-compile sparse_attn
      [WARNING] Torch did not find cuda available, if cross-compiling or running with cpu only you can ignore this message. Adding compute capability for Pascal, Volta, and Turing (compute capabilities 6.0, 6.1, 6.2)
      DS_BUILD_OPS=1
       [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
       [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adagrad requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized executi
on.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  cpu_adam requires the 'lscpu' command, but it does not exist!
       [WARNING]  cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
       [WARNING]  sparse_attn cuda is not available from torch
       [WARNING]  sparse_attn requires a torch version >= 1.5 but detected 2.0
       [WARNING]  please install triton==1.0.0 if you want to use sparse attention
       [WARNING]  One can disable sparse_attn with DS_BUILD_SPARSE_ATTN=0
       [ERROR]  Unable to pre-compile sparse_attn
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Same problem,seems no way to solve the problem,but it works fine on linux...

Trace2333 avatar Apr 02 '23 15:04 Trace2333