SCNet-PyTorch
SCNet-PyTorch copied to clipboard
Unable to reproduce the results of the paper
I tried to train the model, but the average SDR is only about 6.
Here are my specific results:
Hi!
Did you try to train the model from this repository?
Also, i did not add the part with augmentations, but it may probably increase the quality.
I only used the model definition part. The augmentations are from https://github.com/adefossez/demucs/blob/main/demucs/augment.py. According to the paper, remix and scale are used.
I had the same bad results. I used default model parameters from repo. I tried to train only vocal model with my own MSS code: https://github.com/ZFTurbo/Music-Source-Separation-Training
My results after one day of traiining on 3 GPUs was ~6.7 SDR for vocals. Some notes:
- Authors said they used batch size 4 per GPU, while I could fit 30. I think the size of paper's model was much bigger
- Weights size on disk is only 37 MB, which also shows that most likely settings for model must be different.
Also, authors of the paper stated that size of the small model was 10.08M parameters, while this implementation has ~9.46M parameters.
I still think that the model is implemented correctly, and it may be also a sign of some settings defined differently in the paper.
I have not had stellar results yet either, though I am continuing to train. I am running into very striated spectrograms for the output stems (like the attached image). Has anyone else run into this and does it resolve with further training?
Also, they don't mention it in the paper, but if they are using EMA the memory usage begins to make more sense.
The parameters in scnet.py could be incorrect: https://github.com/amanteur/SCNet-PyTorch/blob/main/src/model/scnet.py#L150
1. The orders of the parameters in lines 152-155 appear inconsistent, specifically,
edit: assuming low to high, this is correct"dims": [4, 32, 64, 128]
"bandsplit_ratios": [0.175, 0.392, 0.433]
"downsample_strides": [1, 4, 16]
are not corresponding to
"n_conv_modules": [3, 2, 1],
I am unsure if the splitting of the frequency dimension is iterated from low freq > high freq, but I would appreciate if @amanteur would double check this.
edit 2: after trying to figure out why crlandsc's spectrogram is so messed up in the lower frequencies i am beyond confused lol. I think there is a misunderstanding between the params and implementation for two possibilities:
1. The low freq band is going through all three convolution modules which means it is getting compressed 70% 3 times
2. The low freq is going through 4/16 sd block strides instead of one
edit 3: i am stupid + wrong lol. i still think something inside the sd block is wrong tho whether it be the hyper params or the code itself
@amanteur any news regarding this paper? Have you also had results worse than the paper?
Hi, how is this .pqt file generated?
Hey!
Sorry for such a delayed response. I was really busy with my work, so I didn't have time to check on the repository and train a new model.
@mapperize Thank you for investigating my code! It definitely has some bugs here and there, and I will look into that this weekend.
@JuYangFu It is initialized when you train a model for the first time, more precisely, here. So, you need to specify the path via export DATASET_PATH=/path/to/dataset/dataset.pqt where you want your file to be generated. I will add an additional note in the README.
没问题,我明白。
Amantur Amatov @.***> 于2024年4月11日周四 01:26写道:
Hey!
Sorry for such a delayed response. I was really busy with my work, so I didn't have time to check on the repository and train a new model.
@mapperize https://github.com/mapperize Thank you for investigating my code! It definitely has some bugs here and there, and I will look into that this weekend.
@JuYangFu https://github.com/JuYangFu It is initialized when you train a model for the first time, more precisely, here https://github.com/amanteur/SCNet-PyTorch/blob/71d333c7b78ba5adbabfda803165df1b6b43d57a/src/data/dataset.py#L89. So, you need to specify the path via export DATASET_PATH=/path/to/dataset/dataset.pqt where you want your file to be generated. I will add an additional note in the README.
— Reply to this email directly, view it on GitHub https://github.com/amanteur/SCNet-PyTorch/issues/1#issuecomment-2048100903, or unsubscribe https://github.com/notifications/unsubscribe-auth/A73T2OQDSPOQ4FFC2MJW6FDY4VY3ZAVCNFSM6AAAAABD5OALFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYGEYDAOJQGM . You are receiving this because you were mentioned.Message ID: @.***>
@amanteur Hello, I am the author of SCNet. Thank you for your excellent implementation of this work. However, there are some details that differ from our code, such as the kernel size in the SD layer (sorry, I did not mention this in the paper), etc. Our model is now available at this repository for reference.
@starrytong yeah, I've just started to experiment with kernel sizes in SD layer:D
Thank you for your work and repo!
@amanteur Hello, I am the author of SCNet. Thank you for your excellent implementation of this work. However, there are some details that differ from our code, such as the kernel size in the SD layer (sorry, I did not mention this in the paper), etc. Our model is now available at this repository for reference.
Will you be able to provide audio examples or demos of SCNET? Eager to train on it!
Hi author, I am experiencing this problem please what is the reason?
Hi author, I am experiencing this problem please what is the reason?
This was an issue with metrics callback and should be fixed in the code by now - did you (git) pull a fresh updated copy? See https://github.com/amanteur/SCNet-PyTorch/issues/2