ultimatevocalremover_api icon indicating copy to clipboard operation
ultimatevocalremover_api copied to clipboard

[Bug] MDX23C-8KFFT-InstVoc_HQ bug on linux (Google Colab)

Open ShiromiyaG opened this issue 10 months ago • 7 comments

I was testing the MDX23C-8KFFT-InstVoc_HQ on Google Colab, and I was surprised when I heard the result, the audio was slow, the singer was singing slowly and the audio length was longer. I tested the same song on Windows with the same settings, and the results were normal. Here, the code I used, both in Colab and on Windows:

MDX23C = models.MDXC(name="MDX23C-8KFFT-InstVoc_HQ", other_metadata={'is_mdx_c_seg_def': True,'segment_size': 384,'batch_size': 8,'overlap_mdx23': 8,'semitone_shift': 0},device=device, logger=None)
res = MDX23C(input_file)
vocals = res["vocals"]
af.write(f"{no_inst_folder}/{basename}_MDX23C.wav", vocals, MDX23C.sample_rate)

Here, the link to the songs: https://drive.google.com/drive/folders/11aete_dd56XqR68P2cr_BMRlPhvHb7W0?usp=drive_link

And also an Audacity photo of the songs: image

ShiromiyaG avatar Apr 05 '24 17:04 ShiromiyaG

Are you sure that the input_file had 44100 sampling rate? The current code doesn’t resample automatically.

MohannadEhabBarakat avatar Apr 05 '24 20:04 MohannadEhabBarakat

@MohannadEhabBarakat Yes, I'm sure, I don't think I've ever used Hi-Res audio in separation. All the audio I use comes from Deezer

ShiromiyaG avatar Apr 05 '24 20:04 ShiromiyaG

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

ShiromiyaG avatar Apr 05 '24 21:04 ShiromiyaG

I just tested with two models, a VR (karokee_4band_v2_sn) and an MDX (Reverb HQ), and both gave normal results. I remembered that in the last tests I did, I used videos from YT, not from Deezer, but I don't think this is a problem, since the normal results from VR and MDX were using a video from YT

ShiromiyaG avatar Apr 05 '24 22:04 ShiromiyaG

I was testing the HQ4, it also has this same problem, both on Windows and Linux. It looks like the semitone_shift is wrong. Also, this message apear

C:\Users\Guilherme\anaconda3\lib\site-packages\uvr\models_dir\mdx\mdx_interface.py:270: RuntimeWarning: invalid value encountered in divide
  tar_waves = result / divider

ShiromiyaG avatar Apr 16 '24 22:04 ShiromiyaG

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

I think that might be caused because of package versions or resampling algorithms. I noticed that UVR GUI used different resampling according to the OS. I'm not sure why they did it but I just followed them to replicate the same results. For the package versions unfortunately even using the same versions might not solve the issue; As some libraries will have different implementations on different OSs (even with the same version). The workaround that worked for me in the past was to wrap everything in a docker file. Which is basically unifying the OS.

As I'm back now I'll be working on:

  1. Fixing the bugs you found
  2. Adding new docs
  3. Adding new weights (at least the ones you tested)

So if you can send me an email with your findings and the current bugs, it will help me a lot 🤗. [email protected]

MohannadEhabBarakat avatar Apr 18 '24 15:04 MohannadEhabBarakat

@MohannadEhabBarakat And also, I used the same audio in both Windows and Colab and had different results, which I found strange. Maybe it's something to do with package versions. Here is the requirements file that I used in colab. I'm going to test it today with VR models with the same package versions, and write what the results were requirements.txt

I think that might be caused because of package versions or resampling algorithms. I noticed that UVR GUI used different resampling according to the OS. I'm not sure why they did it but I just followed them to replicate the same results. For the package versions unfortunately even using the same versions might not solve the issue; As some libraries will have different implementations on different OSs (even with the same version). The workaround that worked for me in the past was to wrap everything in a docker file. Which is basically unifying the OS.

As I'm back now I'll be working on:

  1. Fixing the bugs you found
  2. Adding new docs
  3. Adding new weights (at least the ones you tested)

So if you can send me an email with your findings and the current bugs, it will help me a lot 🤗. [email protected]

I can try to help, but I don't know if it would be of much help, since I don't use most models, and I end up using only specific ones. In fact, I tested a model that is not available in the UVR repository, but that works both in UVR and in your code. If you want to take a look at this model I'm referring to, I uploaded it to the link below: https://github.com/ShiromiyaG/RVC-AI-Cover-Maker/releases (its the karokee model)

ShiromiyaG avatar Apr 19 '24 00:04 ShiromiyaG