Something weird is going on with FTANet expected input range

Open genisplaja opened this issue 5 months ago • 0 comments

See, I was running some predictions with the melody:ftanet-carnatic and I noticed the following:

audio, sr = torchaudio.load(audio_file_, normalize=True)
### audio.max() is like in the range of [0, 1]
audio = audio * LOUD_FACTOR
predicted_data = ftanet.predict(audio.numpy(), input_sr=sr)

I only managed to achieve good prediction results when using a really large LOUD_FACTOR like 20 or so. Maybe we should take a look inside the code and explore if we are wrongly normalizing somewhere!

Jul 18 '25 23:07 genisplaja