meta-tasnet
meta-tasnet copied to clipboard
Provide trained model in higher resolution
I have now carried out extensive tests with the model. Unfortunately I found out that the output signal is always cut off at 22050 Hz. Although the actual output signal would have a purely theoretical resolution of 32000 Hz. This means that the signal does not have the full range that it could actually have.
Is this due to the learned model, or any additional settings? I have now proceeded as described in the Python notebook, and double-checked everything. Unfortunately the quality is not as brilliant as it could be due to the 22050 Hz output result. Here is a short explanation.
Hi, you're right, there's a clear cut-off after 10 kHz, but we're unsure about its cause. It seems to be an internal property of the neural network. Please let me know if you catch the bug :)
I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.
Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:
S = 10; 1, T/2, sr=6000 Hz S = 20; 1,T, sr=12000 Hz S = 40; 1,2T, sr=24000 Hz S = 80; 1,4T, sr=48000 Hz
I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.
Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:
S = 10; 1, T/2, sr=6000 Hz S = 20; 1,T, sr=12000 Hz S = 40; 1,2T, sr=24000 Hz S = 80; 1,4T, sr=48000 Hz
Did someone, made the test ?
There was a similar issue to this with Spleeter, where high frequencies are not present in output files. Here's their explanation: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-
@davda54 Could this issue be similar to that?