laekov comments

Results 38 comments of


                                            laekov

During inference, the output of noisy gate is nan.

@Sengxian Can you please shed some light on why we are multiplying the noise with `self.training` [here](https://github.com/laekov/fastmoe/blame/master/fmoe/gates/noisy_gate.py#L113)?

During inference, the output of noisy gate is nan.

I suppose it should be `raw_noise * training + eps` instead of `(raw_noise + eps) * training`

During inference, the output of noisy gate is nan.

> Do I accurately comprehend your statement: `noise_stddev = self.softplus(raw_noise_stddev) * self.training + self.noise_epsilon` ? Yes, I think that can help fixing your nan issue. But as I am not...

pytest error

Which test is this error produced by?

pytest error

I am not able to reproduce this issue. Maybe you need to verify that the nccl version of your pytorch matches the nccl version that you use to compile FastMoE....

how to use balance loss?

Sorry for the late reply. The [`BaseGate` module](https://github.com/laekov/fastmoe/blob/master/fmoe/gates/base_gate.py) has methods including `set_loss`, `get_loss` and `has_loss`. In a customized gate (or gates in FastMoE with balance losses), they use `self.set_loss` to...

laekov

During inference, the output of noisy gate is nan.

During inference, the output of noisy gate is nan.

During inference, the output of noisy gate is nan.

pytest error

pytest error

how to use balance loss?

A bug in switch_gate

A bug in switch_gate

A bug in switch_gate

Only 204 unique tokens (vocabulary size) in enwik8 (transformer-XL example)