laekov
laekov
@Sengxian Can you please shed some light on why we are multiplying the noise with `self.training` [here](https://github.com/laekov/fastmoe/blame/master/fmoe/gates/noisy_gate.py#L113)?
I suppose it should be `raw_noise * training + eps` instead of `(raw_noise + eps) * training`
> Do I accurately comprehend your statement: `noise_stddev = self.softplus(raw_noise_stddev) * self.training + self.noise_epsilon` ? Yes, I think that can help fixing your nan issue. But as I am not...
Which test is this error produced by?
I am not able to reproduce this issue. Maybe you need to verify that the nccl version of your pytorch matches the nccl version that you use to compile FastMoE....
Sorry for the late reply. The [`BaseGate` module](https://github.com/laekov/fastmoe/blob/master/fmoe/gates/base_gate.py) has methods including `set_loss`, `get_loss` and `has_loss`. In a customized gate (or gates in FastMoE with balance losses), they use `self.set_loss` to...
That is a good point. I think you are right. Can you please open a pull request on this? Thanks. BTW, I am also wondering if the capacity calculation in...
@Peg-Wu As you are not using torch distributed, `world_size` has to be `1`.
@Peg-Wu Refer to [this test](https://github.com/laekov/fastmoe/blob/master/tests/test_ddp.py#L67)
@xptree any ideas on this?