rnnoise icon indicating copy to clipboard operation
rnnoise copied to clipboard

my_crossentropy issue

Open xiaoyaoxiaoxian opened this issue 4 years ago • 10 comments

Hi, I have a question with my_crossentropy loss. The loss is as follows. When I refer to the usage of K.binary_crossentropy, I find that the y_true should be the first place. So is there anyone could explain this issue? def my_crossentropy(y_true, y_pred): return K.mean(2*K.abs(y_true-0.5) * K.binary_crossentropy(y_pred, y_true), axis=-1)

useage from guide: keras.losses.binary_crossentropy(y_true, y_pred, from_logits=False, label_smoothing=0) tf.keras.backend.binary_crossentropy( target, output, from_logits=False)

xiaoyaoxiaoxian avatar Apr 01 '20 06:04 xiaoyaoxiaoxian

cross_entropy_fail

I created a plot to show the difference. The blue and orange lines are showing the result of correct usage. The red and green ones are showing the RNNoise behaviour.

You're probably right, that this wasn't intended.

Zadagu avatar Apr 01 '20 14:04 Zadagu

cross_entropy_fail

I created a plot to show the difference. The blue and orange lines are showing the result of correct usage. The red and green ones are showing the RNNoise behaviour.

You're probably right, that this wasn't intended.

Hi,Zadagu, I have one more question. I find that -1 can be assigned to label g vector. In this case, is it a right behavior in the loss mycost where K.sqrt(y_true) is caculated? It has risks that K.sqrt(-1) may be caculated. It should has a NAN issue, but when I perform training, it seems not NAN occurs.

xiaoyaoxiaoxian avatar Apr 02 '20 02:04 xiaoyaoxiaoxian

The gain of -1 is set if there is no (or just low) signal energy whether in the clean speech nor in the noise. In that case it doesn't matter which gain is applied, because the signal isn't audible. There are also other cases where a gain of -1 is set, you can look them up in the denoise.c.

In the training the gains of -1 are ignored by multiplying the loss per gain with mymask(ground_truth_gains). For every gain in range between 0 and 1, mymask returns 1. For every -1 a 0 is returned. By multiplying this mask to the loss per gain, all gains of -1 are ignored.

Zadagu avatar Apr 02 '20 07:04 Zadagu

The gain of -1 is set if there is no (or just low) signal energy whether in the clean speech nor in the noise. In that case it doesn't matter which gain is applied, because the signal isn't audible. There are also other cases where a gain of -1 is set, you can look them up in the denoise.c.

In the training the gains of -1 are ignored by multiplying the loss per gain with mymask(ground_truth_gains). For every gain in range between 0 and 1, mymask returns 1. For every -1 a 0 is returned. By multiplying this mask to the loss per gain, all gains of -1 are ignored.

Thank you , Zadagu. I find that in the rnn_data.c given in the repos, the activation in gru is relu while in the training model the activation function is tanh. Does that mean relu has a better performance? Meanwhile, may I know how many hours of training data do you use in your training and the best loss you can reach ?

xiaoyaoxiaoxian avatar Apr 03 '20 02:04 xiaoyaoxiaoxian

These questions are already answered.

Tanh vs ReLU: https://github.com/xiph/rnnoise/issues/58 https://github.com/xiph/rnnoise/issues/79

For the amount of training data see: https://jmvalin.ca/papers/rnnoise_mmsp2018.pdf

Zadagu avatar Apr 03 '20 10:04 Zadagu

These questions are already answered.

Tanh vs ReLU: #58 #79

For the amount of training data see: https://jmvalin.ca/papers/rnnoise_mmsp2018.pdf

Thanks. I checked the loss in train code in detail. It seems somewhat boost the vad loss when use binary_crossentropy in that way. And in my train test, the whole loss can hardly be smaller than 0.9. Is this a good loss? Or should I change the code of my_crossentropy?

xiaoyaoxiaoxian avatar Apr 03 '20 15:04 xiaoyaoxiaoxian

I followed to change the my_crossentropy() to have the y_true as the first argument and y_pred as the second argument in def my_crossentropy(y_true, y_pred): return K.mean(2*K.abs(y_true-0.5) * K.binary_crossentropy(y_pred, y_true), axis=-1),

but then when i run my ./rnn_train.py script, I get all the loss as nan.. 2016/22500 [=>............................] - ETA: 20:30 - loss: nan - denoise_output_loss: nan - vad_output_loss: nan - denoise_output_msse: nan - vad_output_msse: nan

Any insights would be appreciated... what could be going wrong? But then if I have y_pred as the first argument to Keras binary entropy, I get to see the loss values as the iteration starts.. (higher with RELU and lower with TANH)... but tend to get nan values for every epoch..

sporwar-lifesize avatar Apr 05 '20 22:04 sporwar-lifesize

I followed to change the my_crossentropy() to have the y_true as the first argument and y_pred as the second argument in def my_crossentropy(y_true, y_pred): return K.mean(2*K.abs(y_true-0.5) * K.binary_crossentropy(y_pred, y_true), axis=-1),

but then when i run my ./rnn_train.py script, I get all the loss as nan.. 2016/22500 [=>............................] - ETA: 20:30 - loss: nan - denoise_output_loss: nan - vad_output_loss: nan - denoise_output_msse: nan - vad_output_msse: nan

Any insights would be appreciated... what could be going wrong? But then if I have y_pred as the first argument to Keras binary entropy, I get to see the loss values as the iteration starts.. (higher with RELU and lower with TANH)... but tend to get nan values for every epoch..

yes, I also meet nan issue if I use relu but there is no nan when use tanh. I tried several methods(reduce lr, etc.), but it was not resolved. As fas as I know, the gradient suddenly goes to nan at any iteration, although the previous gradient is quite ok. The nan issue can come out at a random point, even the data is not shuffled. After a few tries, I changes tf to 2,1 version , there is nan even I use relu. So far , I still donot know the root cause.

xiaoyaoxiaoxian avatar Apr 07 '20 07:04 xiaoyaoxiaoxian

The gain of -1 is set if there is no (or just low) signal energy whether in the clean speech nor in the noise. In that case it doesn't matter which gain is applied, because the signal isn't audible. There are also other cases where a gain of -1 is set, you can look them up in the denoise.c.

In the training the gains of -1 are ignored by multiplying the loss per gain with mymask(ground_truth_gains). For every gain in range between 0 and 1, mymask returns 1. For every -1 a 0 is returned. By multiplying this mask to the loss per gain, all gains of -1 are ignored.

Hi Zadagu, I still have a question about the gain. The output of model contains all gains of -1, but the activation function of denoise_output is sigmoid, which ranges from 0 to 1, that means the gain_pred can only be trained to (0,1) while the gain_true is actually -1, would it affect the denoised result?

erin1109 avatar Sep 10 '20 10:09 erin1109

Hi Zadagu, I still have a question about the gain. The output of model contains all gains of -1, but the activation function of denoise_output is sigmoid, which ranges from 0 to 1, that means the gain_pred can only be trained to (0,1) while the gain_true is actually -1, would it affect the denoised result?

No, it doesn't effect the result. The -1 is just a hack/flag to mark values which shouldn't influence the error.

Zadagu avatar Sep 10 '20 11:09 Zadagu