Applio icon indicating copy to clipboard operation
Applio copied to clipboard

AMD GPU on Applio powered by Zluda

Open AznamirWoW opened this issue 1 year ago • 8 comments

  1. Install pre-requisites: a) HIP SDK 6.1.2 https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html b) Zluda 3.8 https://github.com/lshqqytiger/ZLUDA/releases/tag/rel.86cdab3b14b556e95eafe370b8e8a1a80e8d093b c) if using GPU unsupported by HIP SDK, such as 6600/6700(XT) need to find/build a custom set of libraries

  2. Modify run-install.bat to pull cu118 torch libraries instead of cu121 pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118

  3. execute run-install.bat to build local environment

  4. unzip Zluda into zluda folder, patch torch libraries using copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y

  5. apply the patch

  6. Start Applio using zluda\zluda.exe -- env\python.exe app.py

AznamirWoW avatar Jul 14 '24 15:07 AznamirWoW

how do u apply the patch? i get a lot of errors which say "patch does not apply"

27kaive avatar Jul 16 '24 04:07 27kaive

how do u apply the patch? i get a lot of errors which say "patch does not apply"

I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.

AznamirWoW avatar Jul 16 '24 05:07 AznamirWoW

i am not really good in coding so where i past this thing please ? copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y

tej1940 avatar Jul 22 '24 12:07 tej1940

Note that 1st time Zluda gets a task to process it may take 10-20 minutes for it to compile a kernel code. During this time there's no visible output. Sit tight and wait until it it done.

AznamirWoW avatar Jul 25 '24 22:07 AznamirWoW

how do u apply the patch? i get a lot of errors which say "patch does not apply"

I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.

Is the version you made working?

fcastro97 avatar Aug 04 '24 05:08 fcastro97

how do u apply the patch? i get a lot of errors which say "patch does not apply"

I've created a proper pull request with the required changes. You can apply them manually to 3.2.1 version if you want.

Is the version you made working?

works fine with 3.2.1 and 3.2.2

AznamirWoW avatar Aug 04 '24 10:08 AznamirWoW

I will leave this issue pinned so that users can share their feedback and consider an implementation in the near future.

blaisewf avatar Aug 05 '24 16:08 blaisewf

i am not really good in coding so where i past this thing please ? copy zluda\cublas.dll env\Lib\site-packages\torch\lib\cublas64_11.dll /y copy zluda\cusparse.dll env\Lib\site-packages\torch\lib\cusparse64_11.dll /y copy zluda\nvrtc.dll env\Lib\site-packages\torch\lib\nvrtc64_112_0.dll /y

You use these commands in cmd ;)

but first you want to use cd yourappliodirectory.

If your applio is on another drive you use cd /d cd /d d:\Applio

Minksh avatar Aug 08 '24 11:08 Minksh

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

Osony avatar Aug 20 '24 04:08 Osony

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch

AznamirWoW avatar Aug 20 '24 04:08 AznamirWoW

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch

What about using not pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118 but pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/rocm6.0 ? Is there any other Nvidia/CUDA related code aside PyTorch? There is some code to looking for right GPU in file config.py line 28, assuming to PyTorch docs, i need to change "cuda:0" to "cuda", but it's not helping, i got error "RuntimeError: HIP error: invalid device function" and idk how to fix that

Osony avatar Aug 20 '24 06:08 Osony

All of this is written for Windows, for Linux is different, but im not testing copying libs and launching HIP SDK installs with ROCm installation, you need to find instructions for your Linux distro. Modify run-install.sh, execute run-install.sh Starting Applio (assuming you inside Applio folder and here is zluda folder) is: LD_LIBRARY_PATH="zluda:$LD_LIBRARY_PATH" env\python app.py But what library's i need to copy? folder "env\lib\python3.10\site-packages\torch\lib" exist I have 1 - "libcublas.so.11", "libcublasLt.so.11"; 3 - "libnvrtc-builtins.so.11.8", "libnvrtc-672ee683.so.11.2" and "libcaffe2_nvrtc.so", i don't have anything related to "cusparse", what i need to copy and/or replace? (Fedora Linux 40, Python 3.10, completed 3 steps from original post)

There's no easy way to do that on Linux. Most likely you need to build Zluda, then you need to build PyTorch with Zluda. https://github.com/pytorch/pytorch?tab=readme-ov-file#from-source and https://github.com/lshqqytiger/ZLUDA?tab=readme-ov-file#pytorch

What about using not pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118 but pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 --index-url https://download.pytorch.org/whl/rocm6.0 ? Is there any other Nvidia/CUDA related code aside PyTorch? There is some code to looking for right GPU in file config.py line 28, assuming to PyTorch docs, i need to change "cuda:0" to "cuda", but it's not helping, i got error "RuntimeError: HIP error: invalid device function" and idk how to fix that

if you're using RX 7900 variety, you dont need Zluda, you can just use ROCM Pytorch.

AznamirWoW avatar Aug 20 '24 11:08 AznamirWoW

Does this work with Windows? or do you absolutely need Linux?

BSIdro avatar Aug 20 '24 17:08 BSIdro

Does this work with Windows? or do you absolutely need Linux?

It is quite opposite. Making it work with Windows is pretty easy, making it work with Linux is a huge P.I.T.A. Hopefully my PR will be included in the build soon :)

AznamirWoW avatar Aug 20 '24 17:08 AznamirWoW

As much for me I'm a novice, I'm trying to get it to work with my rx 6800, but I don't understand how should I proceed?

BSIdro avatar Aug 20 '24 18:08 BSIdro

As much for me I'm a novice, I'm trying to get it to work with my rx 6800, but I don't understand how should I proceed?

right now you can make the manual changes using PR as an example

https://github.com/IAHispano/Applio/pull/593/commits/2c18fc86ea4cb9aba1e2f4224dca1c88fb6ca25e

there's a readme file to follow as well

AznamirWoW avatar Aug 20 '24 18:08 AznamirWoW

I don't understand I tried to install all this but I got this error in my cmd when I wanted to train the pth

An error occurred processing 0_99.wav: CUDA error: invalid argument CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Disabling CUDNN for traning with Zluda Process Process-1: Traceback (most recent call last): File "C:\ApplioV3.2.1\env\lib\multiprocessing\process.py", line 315, in _bootstrap self.run() File "C:\ApplioV3.2.1\env\lib\multiprocessing\process.py", line 108, in run self._target(*self._args, **self._kwargs) File "C:\ApplioV3.2.1\rvc\train\train.py", line 396, in run net_g = DDP(net_g, device_ids=[rank]) File "C:\ApplioV3.2.1\env\lib\site-packages\torch\nn\parallel\distributed.py", line 797, in init _sync_module_states( File "C:\ApplioV3.2.1\env\lib\site-packages\torch\distributed\utils.py", line 292, in _sync_module_states _sync_params_and_buffers(process_group, module_states, broadcast_bucket_size, src) File "C:\ApplioV3.2.1\env\lib\site-packages\torch\distributed\utils.py", line 306, in _sync_params_and_buffers dist._broadcast_coalesced( RuntimeError: CUDA error: operation not supported CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

EDIT: it's good, I think it works, I installed all that incorrectly, I didn't take the time to run the "run-install.bat" before applying everything else. However sometimes not much happens, I stay at 0% for several minutes and then things happen. I guess it's because this is the first use as mentioned above.

EDIT 2: It works, but it seems rather slow to me, and my GPU doesn't seem to be using much when I look in the task manager? I would say between 3 to 4 minutes for training each epoch GPU at 2%, CPU at 35%

BSIdro avatar Aug 20 '24 21:08 BSIdro

EDIT 2: It works, but it seems rather slow to me, and my GPU doesn't seem to be using much when I look in the task manager? I would say between 3 to 4 minutes for training each epoch GPU at 2%, CPU at 35%

Ttask manager does not always report correct load. Adrenaline Control Panel may show correct values. Depending on the size of the training data set, training is 4-5x faster than on CPU.

as for Inference, it is ~10x faster than CPU.

AznamirWoW avatar Aug 21 '24 02:08 AznamirWoW

Good ! Indeed, my 6800 is very functional, it seems long enough to train a model, but I think it's due to my 30 minute dataset, or since it's the first time, I have to let it finish and then for the next workout will be faster ?

4 minutes per epoch currently. is this normal?

BSIdro avatar Aug 21 '24 15:08 BSIdro

not the best speed but considering your dataset it’s okay

blaisewf avatar Aug 21 '24 16:08 blaisewf

Good ! Indeed, my 6800 is very functional, it seems long enough to train a model, but I think it's due to my 30 minute dataset, or since it's the first time, I have to let it finish and then for the next workout will be faster ?

4 minutes per epoch currently. is this normal?

with 23min in sliced_audios 10 epochs take ~30 minutes on my 6700XT with 42 min in sliced_audios 10 epochs take just under an hour.

Depending on the quality of data / using custom pretrained you dont need that many epochs anyway. In some cases I had good results with just 20.

AznamirWoW avatar Aug 21 '24 16:08 AznamirWoW

Indeed, I did a test on 30 epochs, the model was good enough! Thanks a lot for the help! I can now use my AMD GPU to train models :)

BSIdro avatar Aug 22 '24 00:08 BSIdro

merged

blaisewf avatar Aug 23 '24 01:08 blaisewf