DeepSpeed
DeepSpeed copied to clipboard
[BUG]pip install doesn't work. Please eeelp.
Describe the bug pip install not working for windows 10 To Reproduce Steps to reproduce the behavior:
- Go to Command Prompt
- Type in 'pip install deepspeed'
- Voila
Expected behavior Install Deepspeed?
System info (please complete the following information):
- OS: Windows 10 Pro
- GPU: RTX 2060 Super
- Screnshot of error
- Python version: 3.10
I've repro'd your issue, will let you know when i have a fix. Our support on windows is unfortunately not as thoroughly tests as on linux. I recognize how funny that sounds since we're part of msft haha :)
I repro'd this on a windows box that does not have a GPU. Can you confirm that torch sees your gpu from windows?
Can you share the results of torch.cuda.is_available()
and torch.cuda.get_device_properties(0)
?
Hi @jeffra ,
I ran above torch commands on my windows, I can see access gpu inside my virtual environment. However deepspeed is not installed.
@hamzafar, can you share the results of this? For some reason torch might not be detecting your CUDA_HOME path.
python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"
I’m currently unable to access the machine I was working on. I’ll get back to you by Wednesday.
Thanks for the reply
On Fri, 18 Nov 2022 at 2:06 AM, Jeff Rasley @.***> wrote:
@hamzafar https://github.com/hamzafar, can you share the results of this? For some reason torch might not be detecting your CUDA_HOME path.
python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"
— Reply to this email directly, view it on GitHub https://github.com/microsoft/DeepSpeed/issues/2137#issuecomment-1319171241, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2HUN66KP4VKXOTJBWELQ4DWI2JLZANCNFSM54WU6GYA . You are receiving this because you authored the thread.Message ID: @.***>
@jeffra I am getting None.
Could you recommend the solution?
Interesting, do you have nvcc
installed somewhere on your machine? This should come with the cuda toolkit (e.g, https://developer.nvidia.com/cuda-11.3.0-download-archive). If you have nvcc, can you share the version? e.g., nvcc --version
No I have not install nvcc in my machine. Ok let me install nvcc on my machine and will share it's version.
I suspect that is the source of the issue here, DeepSpeed requires nvcc
to be installed to compile our c++/cuda extensions. I just created a PR (https://github.com/microsoft/DeepSpeed/pull/2519) that adds a note about this to our requirements.
I have configured cuda on my machine but still same error.
Interesting, glad to see nvcc
is coming up okay. This error at install makes me think CUDA_HOME is still returning None from torch though. Does python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"
still print None? Here's the logic that pytorch takes to find CUDA_HOME: https://github.com/pytorch/pytorch/blob/7ec8a4d2a26f717d0a4073e6005f9edfdd7ab641/torch/utils/cpp_extension.py#L86
Perhaps can you set CUDA_HOME
to be where your cuda install is located? I see on Windows it says it defaults to 'C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v*.*'
System is not accessing the cuda runtime.
What version of torch do you have installed here? can you show me the exact torch.__version__
? I just want to make sure torch cuda is aligned with cuda 10.0 you have installed.
It shows me '1.13.0+cpu'
. It's strange I have install gpu version of pytorch
I will configure pytorch gpu, in different env hopefully it will work.
I have configured pytorch gpu. And getting new error while configuring deep speed.
What version of torch do you have installed here? can you show me the exact
torch.__version__
? I just want to make sure torch cuda is aligned with cuda 10.0 you have installed.
Hi I have attached the screen shoot of the error. could you confirm that cuda is aligned?
@hamzafar it looks like there is an error compiling for sparse_attn
- I think this is expected on Windows due to there being no triton v1.0. Can you set the following environment variable and try running again? DS_BUILD_SPARSE_ATTN=0
@mrwyattii thank you for recommending me solution. However, this time there is no luck
@hamzafar I believe aynsc_io
may also be broken due to missing dependencies on windows. Try with this environment variable set also: DS_BUILD_AIO=0
@mrwyattii no luck again :)
well looks like you are making it further into the install, so that's good news. I'm curious why you don't have permission to remove that directory. Can you try running with admin/root privileges and see if that addresses the permissions error?
I run anaconda PowerShell prompt as administrator and executed the pip install deepspeed command. However still getting Permission error.
(Note: these steps are for the interference only mode) After trying forever, I got it working. That's what I have done:
- Install the vs build tool 2019. If you already have it installed, repair it;
- Install Miniconda (if you haven't it already);
- Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;
- Open "Anaconda Prompt (MiniConda3)";
- Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"
- Activate the conda env using "conda activate dsenv";
- Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";
- Close anaconda prompt;
- Open the Start -> "x64 Native Tools Command Prompt for VS 2019";
- Initialize conda on the Command prompt using "conda init cmd.exe";
- Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;
- Activate the conda env using "conda activate dsenv";
- Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";
- Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);
- Go to the deepspeed folder using "cd DeepSpeed";
- Make 10 prayers to your god and try to install using "build_win.bat";
- A .whl will be created in the dist folder.
To install the generated .whl, just use: For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl
Extra Notes: Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"
About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540:
New Lines 531 and 532:
{static_cast
New lines 539 and 540:
{static_cast
For anyone that just want the final .whl to install using python, here it is (no prayers needed): https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing