DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG]pip install doesn't work. Please eeelp.

Open kaustubhroy1995 opened this issue 2 years ago • 24 comments

Describe the bug pip install not working for windows 10 To Reproduce Steps to reproduce the behavior:

  1. Go to Command Prompt
  2. Type in 'pip install deepspeed'
  3. Voila

Expected behavior Install Deepspeed?

System info (please complete the following information):

  • OS: Windows 10 Pro
  • GPU: RTX 2060 Super
  • Screnshot of error Deepspeed_Issue
  • Python version: 3.10

kaustubhroy1995 avatar Jul 26 '22 16:07 kaustubhroy1995

I've repro'd your issue, will let you know when i have a fix. Our support on windows is unfortunately not as thoroughly tests as on linux. I recognize how funny that sounds since we're part of msft haha :)

jeffra avatar Jul 30 '22 00:07 jeffra

I repro'd this on a windows box that does not have a GPU. Can you confirm that torch sees your gpu from windows?

Can you share the results of torch.cuda.is_available() and torch.cuda.get_device_properties(0)?

jeffra avatar Jul 30 '22 00:07 jeffra

Hi @jeffra , I ran above torch commands on my windows, I can see access gpu inside my virtual environment. However deepspeed is not installed. image

hamzafar avatar Nov 17 '22 20:11 hamzafar

@hamzafar, can you share the results of this? For some reason torch might not be detecting your CUDA_HOME path.

python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"

jeffra avatar Nov 17 '22 20:11 jeffra

I’m currently unable to access the machine I was working on. I’ll get back to you by Wednesday.

Thanks for the reply

On Fri, 18 Nov 2022 at 2:06 AM, Jeff Rasley @.***> wrote:

@hamzafar https://github.com/hamzafar, can you share the results of this? For some reason torch might not be detecting your CUDA_HOME path.

python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)"

— Reply to this email directly, view it on GitHub https://github.com/microsoft/DeepSpeed/issues/2137#issuecomment-1319171241, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2HUN66KP4VKXOTJBWELQ4DWI2JLZANCNFSM54WU6GYA . You are receiving this because you authored the thread.Message ID: @.***>

kaustubhroy1995 avatar Nov 17 '22 20:11 kaustubhroy1995

@jeffra I am getting None. image

Could you recommend the solution?

hamzafar avatar Nov 17 '22 20:11 hamzafar

Interesting, do you have nvcc installed somewhere on your machine? This should come with the cuda toolkit (e.g, https://developer.nvidia.com/cuda-11.3.0-download-archive). If you have nvcc, can you share the version? e.g., nvcc --version

jeffra avatar Nov 17 '22 20:11 jeffra

No I have not install nvcc in my machine. Ok let me install nvcc on my machine and will share it's version.

hamzafar avatar Nov 17 '22 20:11 hamzafar

I suspect that is the source of the issue here, DeepSpeed requires nvcc to be installed to compile our c++/cuda extensions. I just created a PR (https://github.com/microsoft/DeepSpeed/pull/2519) that adds a note about this to our requirements.

jeffra avatar Nov 17 '22 20:11 jeffra

I have configured cuda on my machine but still same error. image

hamzafar avatar Nov 18 '22 18:11 hamzafar

Interesting, glad to see nvcc is coming up okay. This error at install makes me think CUDA_HOME is still returning None from torch though. Does python -c "import torch.utils.cpp_extension; print(torch.utils.cpp_extension.CUDA_HOME)" still print None? Here's the logic that pytorch takes to find CUDA_HOME: https://github.com/pytorch/pytorch/blob/7ec8a4d2a26f717d0a4073e6005f9edfdd7ab641/torch/utils/cpp_extension.py#L86

Perhaps can you set CUDA_HOME to be where your cuda install is located? I see on Windows it says it defaults to 'C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v*.*'

jeffra avatar Nov 18 '22 18:11 jeffra

System is not accessing the cuda runtime. image

hamzafar avatar Nov 18 '22 19:11 hamzafar

What version of torch do you have installed here? can you show me the exact torch.__version__? I just want to make sure torch cuda is aligned with cuda 10.0 you have installed.

jeffra avatar Nov 18 '22 19:11 jeffra

It shows me '1.13.0+cpu'. It's strange I have install gpu version of pytorch

hamzafar avatar Nov 18 '22 19:11 hamzafar

I will configure pytorch gpu, in different env hopefully it will work.

hamzafar avatar Nov 18 '22 19:11 hamzafar

I have configured pytorch gpu. And getting new error while configuring deep speed. image

hamzafar avatar Nov 19 '22 16:11 hamzafar

What version of torch do you have installed here? can you show me the exact torch.__version__? I just want to make sure torch cuda is aligned with cuda 10.0 you have installed.

Hi I have attached the screen shoot of the error. could you confirm that cuda is aligned?

hamzafar avatar Nov 29 '22 18:11 hamzafar

@hamzafar it looks like there is an error compiling for sparse_attn - I think this is expected on Windows due to there being no triton v1.0. Can you set the following environment variable and try running again? DS_BUILD_SPARSE_ATTN=0

mrwyattii avatar Dec 06 '22 19:12 mrwyattii

@mrwyattii thank you for recommending me solution. However, this time there is no luck image

hamzafar avatar Dec 06 '22 19:12 hamzafar

@hamzafar I believe aynsc_io may also be broken due to missing dependencies on windows. Try with this environment variable set also: DS_BUILD_AIO=0

mrwyattii avatar Dec 06 '22 19:12 mrwyattii

@mrwyattii no luck again :) image

hamzafar avatar Dec 06 '22 19:12 hamzafar

well looks like you are making it further into the install, so that's good news. I'm curious why you don't have permission to remove that directory. Can you try running with admin/root privileges and see if that addresses the permissions error?

mrwyattii avatar Dec 07 '22 00:12 mrwyattii

I run anaconda PowerShell prompt as administrator and executed the pip install deepspeed command. However still getting Permission error.

hamzafar avatar Dec 07 '22 11:12 hamzafar

(Note: these steps are for the interference only mode) After trying forever, I got it working. That's what I have done:

  • Install the vs build tool 2019. If you already have it installed, repair it;
  • Install Miniconda (if you haven't it already);
  • Install CUDA 11.7 from https://developer.nvidia.com/cuda-11-7-0-download-archive ;
  • Open "Anaconda Prompt (MiniConda3)";
  • Create a python 3.10 env using: "conda create -n dsenv python=3.10.6"
  • Activate the conda env using "conda activate dsenv";
  • Install Pytorch and CUDA using: "conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia";
  • Close anaconda prompt;
  • Open the Start -> "x64 Native Tools Command Prompt for VS 2019";
  • Initialize conda on the Command prompt using "conda init cmd.exe";
  • Reopen the "x64 Native Tools Command Prompt for VS 2019" AS AN ADMINISTRATOR;
  • Activate the conda env using "conda activate dsenv";
  • Go to your root folder (could be c:\ or any other) and clone que DeepSpeed project "git clone https://github.com/microsoft/DeepSpeed";
  • Depending on the fixes of the DeepSpeed repository, this step might or not be needed: Download here this file (https://drive.google.com/drive/folders/11EYHosWfDLrrVbniBLV1j82qeurpGlvX?usp=sharing) and replace the file at DeepSpeed\csrc\transformer\inference\csrc\pt_binding.cpp (see comments below);
  • Go to the deepspeed folder using "cd DeepSpeed";
  • Make 10 prayers to your god and try to install using "build_win.bat";
  • A .whl will be created in the dist folder.

To install the generated .whl, just use: For Python 3.10 version: pip install deepspeed-0.8.3+6eca037c-cp310-cp310-win_amd64.whl For Pytohn 3.9 version: pip install deepspeed-0.8.3+4d27225f-cp39-cp39-win_amd64.whl

Extra Notes: Note: Tytorch version 1.13.1 with CUDA 11.7 also worked for me, but since it is an older version, I did not mention it in the steps above. If you need that version, install using "conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia"

About the replacement of file pt_binding.cpp: all I did was change lines 531, 532, 539, and 540: New Lines 531 and 532: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

New lines 539 and 540: {static_cast(hidden_dim * Context::Instance().GetMaxTokenLenght()), static_cast(k * Context::Instance().GetMaxTokenLenght()),

For anyone that just want the final .whl to install using python, here it is (no prayers needed): https://drive.google.com/drive/folders/117GSNHcJyzvMPTftl0aPBSwQVsU-z4bM?usp=sharing

marcoseduardopm avatar Apr 11 '23 03:04 marcoseduardopm