compIAM
compIAM copied to clipboard
Something weird is going on with FTANet expected input range
See, I was running some predictions with the melody:ftanet-carnatic and I noticed the following:
audio, sr = torchaudio.load(audio_file_, normalize=True)
### audio.max() is like in the range of [0, 1]
audio = audio * LOUD_FACTOR
predicted_data = ftanet.predict(audio.numpy(), input_sr=sr)
I only managed to achieve good prediction results when using a really large LOUD_FACTOR like 20 or so. Maybe we should take a look inside the code and explore if we are wrongly normalizing somewhere!