PercepNet
PercepNet copied to clipboard
loss increase and appear nan
Hi, thanks for your excellent work. I extract feature from speech(pcm, 12GB) and noise(pcm, 9GB), and set count into 10000000. Then, I run run_train.py and get the following output:

Can you tell me which dataset you use for training?
Can you tell me which dataset you use for training?
Hi, I do some changes as follow: Firstly, I add some clean music data into speech since I want to keep the music when denoising. Secondly, speech and noise are resampled and re-codec from no-original 48k wav(such as: 8k 16k mp3). Maybe impact training result?
I use original 48k speech (concatenate into a pcm, 15GB )and noise(concatenate into a pcm, 7.8GB), set count as 10000000, get increasing loss and nan again.
When I set count as 100000, I get following output:
seem like also increasing loss per iteration but decreasing per epoch, is it normal? When the count is big, the nan seems inevitable.
Hi , I found the problem, the reason of increasing loss is following:
# print statistics
running_loss += loss.item()
# for testing
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss))
Actually, I quite don't understand why you write like this...
The reason of nan is CustomLoss
Hi @YangangCao yes I was dumb, I only check for iter=1, epoch=1 that's why I didn't notice this printed loss increasing error on iteration. I fixed on commit 9de28e0
for nan appear error Did you check extracted feature(r,g) are 0~1? if not it will makes nan loss unless you clip it 0~1.
Thanks
Hi @YangangCao yes I was dumb, I only check for iter=1, epoch=1 that's why I didn't notice this printed loss increasing error on iteration. I fixed on commit 9de28e0
for nan appear error Did you check extracted feature(r,g) are 0~1? if not it will makes nan loss unless you clip it 0~1.
Thanks
I have checked the feature extracted from original 48k wav, they are all range from 0 to 1, including float point number, lots of 0 and sparse 1. When I set the count of extracted feature as 1e5, no nan appears( I tried more than one time). However, When I set as 1e6 and 1e7, nan appears again. I am not sure the relationship between count and nan.
Code has error in 'rnn_train.py' so that loss is nan.
rb = targets[:,:,:34]
gb = targets[:,:,34:68]
but in 'denoise.cpp':
fwrite(g, sizeof(float), NB_BANDS, f3);//gain
fwrite(r, sizeof(float), NB_BANDS, f3);//filtering strength
rb < 0, so that torch.pow(gb, 0.5) is nan
You should change code in 'rnn_train.py' to:
gb = targets[:,:,:34]
rb = targets[:,:,34:68]
Code has error in 'rnn_train.py' so that loss is nan.
Thanks I've fix in #24
There is a new question for 'loss nan'. The feature of pitch correlation could be 'nan'. The value 'error' could be zero in the file named 'celt_lpc.cpp', which make pitch correlation be nan. ''' r = -SHL32(rr,3)/error; ''' You can add a bias to 'error' which can make 'error' not be zero. ''' r = -SHL32(rr,3)/(error + 0.00001); '''