meta-tasnet icon indicating copy to clipboard operation
meta-tasnet copied to clipboard

Provide trained model in higher resolution

Open FSharpCSharp opened this issue 5 years ago • 4 comments

I have now carried out extensive tests with the model. Unfortunately I found out that the output signal is always cut off at 22050 Hz. Although the actual output signal would have a purely theoretical resolution of 32000 Hz. This means that the signal does not have the full range that it could actually have.

Is this due to the learned model, or any additional settings? I have now proceeded as described in the Python notebook, and double-checked everything. Unfortunately the quality is not as brilliant as it could be due to the 22050 Hz output result. Here is a short explanation.

FSharpCSharp avatar Feb 26 '20 09:02 FSharpCSharp

Hi, you're right, there's a clear cut-off after 10 kHz, but we're unsure about its cause. It seems to be an internal property of the neural network. Please let me know if you catch the bug :)

spectrogram_cutoff

davda54 avatar Mar 11 '20 16:03 davda54

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz S = 20; 1,T, sr=12000 Hz S = 40; 1,2T, sr=24000 Hz S = 80; 1,4T, sr=48000 Hz

RadioAngurem avatar Mar 18 '20 11:03 RadioAngurem

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz S = 20; 1,T, sr=12000 Hz S = 40; 1,2T, sr=24000 Hz S = 80; 1,4T, sr=48000 Hz

Did someone, made the test ?

coincoin73 avatar Jul 05 '20 10:07 coincoin73

There was a similar issue to this with Spleeter, where high frequencies are not present in output files. Here's their explanation: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-

@davda54 Could this issue be similar to that?

JeffreyCA avatar Oct 12 '20 20:10 JeffreyCA