fastmoe During inference, the output of noisy gate is nan.

During inference, the output of noisy gate is nan.

Open zqhang opened this issue 1 year ago • 5 comments

The training process proceeds smoothly; however, an issue arises during inference as the noise_stddev becomes zero when self.training is False, leading to an error when computing the load. Should we refrain from adding noise in the NoisyGate during inference?

Dec 03 '23 03:12 zqhang

@Sengxian Can you please shed some light on why we are multiplying the noise with self.training here?

Dec 04 '23 06:12 laekov

I suppose it should be raw_noise * training + eps instead of (raw_noise + eps) * training

Dec 04 '23 06:12 laekov

Do I accurately comprehend your statement: noise_stddev = self.softplus(raw_noise_stddev) * self.training + self.noise_epsilon ?

Dec 04 '23 07:12 zqhang

Do I accurately comprehend your statement: noise_stddev = self.softplus(raw_noise_stddev) * self.training + self.noise_epsilon ?

Yes, I think that can help fixing your nan issue. But as I am not an algiorithm person, I am not sure if this is what the nosiy gate is expected to behave for inference.

Dec 04 '23 07:12 laekov

Thank you for your help

Dec 04 '23 07:12 zqhang

fastmoe fastmoe copied to clipboard

During inference, the output of noisy gate is nan.

fastmoe
fastmoe copied to clipboard