fastmoe icon indicating copy to clipboard operation
fastmoe copied to clipboard

During inference, the output of noisy gate is nan.

Open zqhang opened this issue 1 year ago • 5 comments

The training process proceeds smoothly; however, an issue arises during inference as the noise_stddev becomes zero when self.training is False, leading to an error when computing the load. Should we refrain from adding noise in the NoisyGate during inference?

zqhang avatar Dec 03 '23 03:12 zqhang

@Sengxian Can you please shed some light on why we are multiplying the noise with self.training here?

laekov avatar Dec 04 '23 06:12 laekov

I suppose it should be raw_noise * training + eps instead of (raw_noise + eps) * training

laekov avatar Dec 04 '23 06:12 laekov

Do I accurately comprehend your statement: noise_stddev = self.softplus(raw_noise_stddev) * self.training + self.noise_epsilon ?

zqhang avatar Dec 04 '23 07:12 zqhang

Do I accurately comprehend your statement: noise_stddev = self.softplus(raw_noise_stddev) * self.training + self.noise_epsilon ?

Yes, I think that can help fixing your nan issue. But as I am not an algiorithm person, I am not sure if this is what the nosiy gate is expected to behave for inference.

laekov avatar Dec 04 '23 07:12 laekov

Thank you for your help

zqhang avatar Dec 04 '23 07:12 zqhang