Music-Source-Separation-Training icon indicating copy to clipboard operation
Music-Source-Separation-Training copied to clipboard

Feature Request: Add ROCm Support for AMD GPUs or OpenCL Support for Integrated Graphics Acceleration Description

Open ShadowLoveElysia opened this issue 1 year ago • 17 comments

First of all, thanks for creating such a useful tool. It's been really helpful for a lot of us!

I wanted to bring up something that would make the software even better for users like me who rely on AMD hardware. Currently, the software supports CUDA for GPU acceleration, which is great for NVIDIA users. However, it would be fantastic if we could also have support for ROCm or OpenCL to take advantage of AMD GPUs or integrated graphics.

What I'm suggesting: ROCM Support: Adding support for ROCm, AMD’s open-source platform for GPU computing, would allow AMD GPU owners to benefit from GPU acceleration within the software. OpenCL for Integrated Graphics: Similar to how some other tools handle it (like UVR), supporting OpenCL would enable the use of integrated graphics for acceleration, which is particularly beneficial for users with AMD APUs. Why this would be great: Improved Performance: Leveraging AMD GPUs or integrated graphics could lead to faster processing times. Broader Compatibility: This change would cater to a wider range of hardware setups, making the tool more accessible. I understand that adding new features takes time and effort, but I believe these additions could significantly enhance the user experience for those using AMD hardware. I hope this feature can be considered in future updates.

ShadowLoveElysia avatar Dec 07 '24 06:12 ShadowLoveElysia

This project uses torch library, so I think you can use ROCM if you install torch with ROCM support. Check here: https://pytorch.org/

изображение

ZFTurbo avatar Dec 07 '24 07:12 ZFTurbo

This project uses torch library, so I think you can use ROCM if you install torch with ROCM support. Check here: https://pytorch.org/

изображение

Do you think it would be feasible to use OpenCL for acceleration with integrated graphics? Intel GPUs currently do not support ROCm or CUDA. Adding OpenCL support would enable acceleration using both AMD and Intel integrated graphics, as well as Intel dedicated GPUs. This is how UVR handles it.

ShadowLoveElysia avatar Dec 07 '24 07:12 ShadowLoveElysia

I think OpenCL is also possible with this pytorch fork: https://github.com/artyom-beilis/pytorch_dlprim

May be some changes in code of this repository needed related to keyword 'cuda'. But I personally can't check it.

ZFTurbo avatar Dec 07 '24 07:12 ZFTurbo

Actually UVR always used DirectML instead of OpenCL. It was Anjok naming mistake corrected in newer beta Roformer patches.

deton24 avatar Dec 24 '24 12:12 deton24

I think it can be used with this repo with minimum changes: https://learn.microsoft.com/en-us/windows/ai/directml/pytorch-windows

ZFTurbo avatar Dec 24 '24 13:12 ZFTurbo

Hi I tried using pytorch-windows and changed a few lines in inference.py like this:

import torch_directml
...
def proc_folder(args):
    ...
    if torch_directml.is_available():
            print('DirectML is available, use --force_cpu to disable it.')
            device = torch_directml.device(args.device_ids[0]) if type(args.device_ids) == list else torch_directml.device(args.device_ids)
            device_name = torch_directml.device_name(args.device_ids[0]) if type(args.device_ids) == list else     torch_directml.device_name(args.device_ids)

but I got this error:

DirectML is available, use --force_cpu to disable it.
Using device:  AMD Radeon RX 6800
Start from checkpoint: E:\Music-Source-Separation-Training-main\checkpoints\MelBandRoformer.ckpt
Instruments: ['vocals', 'other']
Model load time: 1.70 sec
Total files found: 1. Using sample rate: 44100
Processing track: E:\input\test.flac

Processing audio chunks:   0%|          | 0/8037372 [00:00<?, ?it/s]
[F1228 19:15:36.000000000 dml_util.cc:118] Invalid or unsupported data type ComplexFloat.
Process failed with return code 3221226505

I came across this page that says that complex isn't supported in DirectML. Any ideas how it might be possible to work around this?

aqst avatar Dec 28 '24 08:12 aqst

May be it's possible but for this we need to change stft conversion of data inside MelRoformer model to avoid Complex numbers. I'm not sure if it's easy to do.

ZFTurbo avatar Dec 28 '24 15:12 ZFTurbo

May be it's possible but for this we need to change stft conversion of data inside MelRoformer model to avoid Complex numbers. I'm not sure if it's easy to do.

Maybe this repository would help, but I currently get no time to think about it.

KitsuneX07 avatar Dec 29 '24 11:12 KitsuneX07

Maybe you could find how Anjok handles Roformers using DirectML in the UVR's code: https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add

niedz., 29 gru 2024 o 12:05 KitsuneX07 @.***> napisał(a):

May be it's possible but for this we need to change stft conversion of data inside MelRoformer model to avoid Complex numbers. I'm not sure if it's easy to do.

Maybe this repository https://github.com/DakeQQ/STFT-ISTFT-ONNX would help, but I currently get no time to think about it.

— Reply to this email directly, view it on GitHub https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/107#issuecomment-2564688096, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIJ3EHDVBTRDOFBJSQSINPL2H7JOLAVCNFSM6AAAAABTF5QS7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRUGY4DQMBZGY . You are receiving this because you commented.Message ID: @.*** com>

deton24 avatar Dec 29 '24 22:12 deton24

I am trying to modify inference.py to adapt to DirectML for inference, but I do not understand how DirectML works. There may be other areas that need modification and adaptation beyond just inference.py, so I might need some time to research and trial-and-error. I may not be able to produce a decent version. If anyone has ideas, they can also try modifying it; no need to wait for me.

ShadowLoveElysia avatar Dec 30 '24 03:12 ShadowLoveElysia

I also noticed that the dev developers are trying to add DirectML support, and anjok's approach is worth discussing. As the saying goes, "one generation does the hard work, and the next generation benefits from it." I will try to research this together with my friends. (Apologies, I accidentally sent my previous two comments before finishing them, which may lead to duplicate messages.)

ShadowLoveElysia avatar Dec 30 '24 03:12 ShadowLoveElysia

If needed UVR's beta with roformers and directML code is available here https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add%2Bdirectml

jarredou avatar Dec 30 '24 04:12 jarredou

Do you know whether this branch can work on Linux? edit. From what people wrote, yes.

pon., 30 gru 2024 o 05:30 Jarredou @.***> napisał(a):

If needed UVR's beta with roformers and directML code is available here https://github.com/Anjok07/ultimatevocalremovergui/tree/v5.6.0_roformer_add%2Bdirectml

— Reply to this email directly, view it on GitHub https://github.com/ZFTurbo/Music-Source-Separation-Training/issues/107#issuecomment-2565014546, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIJ3EHDCLZAYSWFY7C3C2FD2IDD5ZAVCNFSM6AAAAABTF5QS7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRVGAYTINJUGY . You are receiving this because you commented.Message ID: @.*** com>

deton24 avatar Dec 30 '24 16:12 deton24

@aqst

In roformer code you can try to change these lines:

stft_repr = torch.stft(raw_audio, **self.stft_kwargs, window=stft_window, return_complex=True)
stft_repr = torch.view_as_real(stft_repr)

On this:

stft_repr = torch.stft(raw_audio, **self.stft_kwargs, window=stft_window, return_complex=False)

It's equal and you will avoid complex64 tensor type.

ZFTurbo avatar Jan 01 '25 16:01 ZFTurbo

I tried changing that in mel_band_roformer.py but unfortunately I got the same error.

I also saw line 487 of bs_roformer.py and I tried something similar to run that part on the CPU:

stft_repr = torch.stft(raw_audio.cpu(), **self.stft_kwargs, window=stft_window.cpu(), return_complex=True)
stft_repr = torch.view_as_real(stft_repr).to(device)

but then I got this error:

  File "E:\Music-Source-Separation-Training-main\models\bs_roformer\mel_band_roformer.py", line 531, in forward
    x = stft_repr[batch_arange, self.freq_indices]
        ~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [4, 1], [3958]

I'm not sure what that means but maybe there's some kind of issue in DirectML

aqst avatar Jan 02 '25 07:01 aqst

Hi I tried using pytorch-windows and changed a few lines in inference.py like this:

import torch_directml
...
def proc_folder(args):
    ...
    if torch_directml.is_available():
            print('DirectML is available, use --force_cpu to disable it.')
            device = torch_directml.device(args.device_ids[0]) if type(args.device_ids) == list else torch_directml.device(args.device_ids)
            device_name = torch_directml.device_name(args.device_ids[0]) if type(args.device_ids) == list else     torch_directml.device_name(args.device_ids)

but I got this error:

DirectML is available, use --force_cpu to disable it.
Using device:  AMD Radeon RX 6800
Start from checkpoint: E:\Music-Source-Separation-Training-main\checkpoints\MelBandRoformer.ckpt
Instruments: ['vocals', 'other']
Model load time: 1.70 sec
Total files found: 1. Using sample rate: 44100
Processing track: E:\input\test.flac

Processing audio chunks:   0%|          | 0/8037372 [00:00<?, ?it/s]
[F1228 19:15:36.000000000 dml_util.cc:118] Invalid or unsupported data type ComplexFloat.
Process failed with return code 3221226505

I came across this page that says that complex isn't supported in DirectML. Any ideas how it might be possible to work around this?

Can confirm this issue still exists now. I am using torch-directml with torch 2.4.1 and the latest Melband Roformer. I used this to initialize directML:

dml_device_name = "privateuseone:0"
    # Initialize determined_device, defaulting to CPU
    determined_device = torch.device("cpu") 
    print("Attempting to set up processing device...")

    try:
        print(f"Testing DirectML device: {dml_device_name}...")
        # Test if DirectML is available and functional with a small tensor
        _ = torch.tensor([1.0]).to(dml_device_name)
        
        # If the above line didn't raise an error, DirectML is working
        determined_device = torch.device(dml_device_name)
        print(f"DirectML device {determined_device} is available. Attempting to move model to this device.")
        
        model = model.to(determined_device) # Move the model
        
        # Correctly check the device of the model's parameters
        if list(model.parameters()): # Check if model has parameters
            actual_model_device = next(model.parameters()).device
            print(f"Model parameters are now on device: {actual_model_device}")
            if actual_model_device.type != 'privateuseone':
                # This case should ideally not happen if .to(determined_device) was successful
                print(f"Warning: Model parameters are on {actual_model_device} despite targeting {determined_device}. Check model's .to() implementation.")
                # Fallback if critical
                raise RuntimeError(f"Model moved to {actual_model_device} instead of {determined_device}")
        else:
            print("Model has no parameters. The .to(device) call was made.")
        
        print(f"Successfully set target device to DirectML: {determined_device}")

    except Exception as e:
        print(f"An error occurred during DirectML setup or model transfer. Error: {e}")
        # If the error was the AttributeError, DirectML init was likely fine, but the check was wrong.
        # If it was another error (like from the tensor test), DirectML itself might have an issue.
        import traceback
        traceback.print_exc() # Print the full traceback to see the exact error
        
        print(f"Falling back to CPU.")
        determined_device = torch.device("cpu") # Ensure determined_device is CPU
        if 'model' in locals() and model is not None: # Check if model is defined
             model = model.to(determined_device) # Move model to CPU
        else:
            print("Model was not defined before attempting to move to CPU (should not happen here).")
    
    print(f"--- Proceeding with inference on device: {determined_device} ---")
    # Ensure you use 'determined_device' in your run_folder and subsequent tensor operations
    run_folder(model, args, config, determined_device, verbose=False)

Did anybody manage to make Melband Roformer work with DirectML?

smbursuc avatar May 25 '25 14:05 smbursuc

Just to let you know, someone added partial MPS support to BS/Mel-Roformer. It's already 2x faster than CPU. Maybe it might get useful. https://github.com/axeldelafosse/BS-RoFormer/

deton24 avatar Aug 04 '25 21:08 deton24