apex icon indicating copy to clipboard operation
apex copied to clipboard

Issue Installing Apex in WSL Environment

Open l8g opened this issue 2 years ago • 5 comments

🐛 Bug

I'm having a problem installing Apex in a WSL environment. It seems the installation script for Apex is trying to find the CUDA installation directory and run the nvcc -V command. In a WSL environment, despite CUDA being supported through the NVIDIA WSL driver, there may not exist a proper CUDA installation directory, and nvcc may not be added to the PATH environment variable.

To Reproduce I attempted to install Apex in WSL using the following commands:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir ./

I then received the following error:

File "/home/ldd/nlp/apex/setup.py", line 130, in <module>
  _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ldd/nlp/apex/setup.py", line 17, in get_cuda_bare_metal_version
  raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
                                        ~~~~~~~~~^~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Expected behavior I would expect Apex to be installable in a WSL environment without needing a full CUDA installation directory or nvcc.

Environment

OS: Ubuntu 20.04 on WSL 2 Python version: 3.11 PyTorch version: 2.0.1 CUDA version: NVIDIA CUDA 11.3 driver for Windows GPU models: [e.g. NVIDIA RTX 2080] Apex version: master branch as of 2023-06-06 GCC version: [e.g. 7.5] Any other relevant information: Additional context I'm trying to run a deep learning project that depends on Apex. I'm unable to run this project as Apex cannot be installed in my WSL environment.

l8g avatar Jun 06 '23 10:06 l8g

Did you set CUDA_HOME environment variable? If not, could you try the environment variable?

crcrpar avatar Jun 06 '23 18:06 crcrpar

I appreciate your prompt response. I would like to clarify a few things about my setup:

  1. I am working within a WSL2 (Windows Subsystem for Linux) environment, not a traditional Linux one.
  2. In my Windows system, the CUDA installation directory does not contain all the files and directories that WSL2 would expect from a full CUDA install. In particular, nvcc is missing because WSL2, by design, does not have access to a full CUDA installation that is present in the Windows system.
  3. I have tried setting the CUDA_HOME environment variable in WSL2, but it did not solve the issue, as the directory that this variable points to does not contain nvcc.

Given these constraints, I am currently unable to install Apex in my WSL2 environment. I was wondering if you have any recommendations for installing Apex under WSL2, or if there are plans to support WSL2 in the future?

Thank you for your time and consideration.

Best regards,

l8g avatar Jun 07 '23 07:06 l8g

thank you for your clarification. I misunderstood some bits. could you try commenting out https://github.com/NVIDIA/apex/blob/05091d498d21058a0fe736b828c43431d4f0dda2/setup.py#L130 ? since your install command doesn't depend on any custom extensions, I don't think the cuda ver check is needed

crcrpar avatar Jun 07 '23 07:06 crcrpar

Thank you very much for your guidance. I have successfully installed Apex and it seems to be functioning correctly. I ran a script that tests Apex's Automatic Mixed Precision (AMP) feature and everything worked as expected.

However, during the test, I received a warning that the multi_tensor_applier fused unscale kernel is unavailable, and Apex was using a Python fallback. The message suggested this might be because Apex was installed without --cuda_ext --cpp_ext.

This isn't causing me any problems at the moment, but I am wondering if it may impact performance, and if so, what I should do about it. My understanding is that compiling the CUDA extensions in WSL2 might be nontrivial due to the unique setup.

Any further advice you could provide on this topic would be greatly appreciated.

l8g avatar Jun 07 '23 12:06 l8g

For others that run into this problem besides the version check comment I had to add "packaging" and "torch" as requirements to pyproject.toml in order to make it work

ingura avatar Sep 02 '23 12:09 ingura