exo icon indicating copy to clipboard operation
exo copied to clipboard

Not working on WSL (debian) on Windows 11

Open stevef1uk opened this issue 1 year ago • 8 comments

Just wondering if this is expect to work on Windows?

I wanted a NVIDIA GPU and it came in a gaming PC (470 Super)

I quickly realised this project wouldn't work natively on Windows so I tried it on WSL using debian:

Linux W11 5.15.167.4-microsoft-standard-WSL2 #1 SMP Tue Nov 5 00:21:55 UTC 2024 x86_64 GNU/Linux

After a LOT of time trying to get the NVIDIA cuda packages installed I eventually managed:

 python --version
Python 3.12.2

 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Oct_29_23:50:19_PDT_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0


nvidia-smi
Fri Dec  6 19:43:15 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.05              Driver Version: 556.12         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
| 30%   21C    P8              7W /  220W |     569MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A       193      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+


Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006

Result of classification: 1 3 5

Test passed!

I found an issue with the NVIDIA driver as the line in exo/topology/device_capabilities.py:

I changed:

 #gpu_raw_name = pynvml.nvmlDeviceGetName(handle).upper()
 gpu_raw_name = "NVIDIA GEFORCE RTX 4070 SUPER"

Then exo runs and initially looks good just with this single node. However, with the first small model a simple 'hi' reslted in a generated response of

HowQuestion of you you you you you you you you you you you you you you you you you you you you you you you you you you

and apalling performance before exo crashed.

Any ideas

stevef1uk avatar Dec 06 '24 18:12 stevef1uk