SwitchTransformers
SwitchTransformers copied to clipboard
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.11 to 1.9.0. Release notes Sourced from pypa/gh-action-pypi-publish's releases. v1.9.0 💅 Cosmetic Output Improvements @woodruffw💰 updated the tense on password nudge in #234 @shenxianpeng💰 helped us disable...
**Describe the bug** Shape mismatch is found in the computation of auxiliary loss values: https://github.com/kyegomez/SwitchTransformers/blob/36a1ea01448e56242222b68201207a7219d72b4b/switch_transformers/model.py#L70-L74 where `load` is of shape `[num_experts, dim]` and `importance` is of shape `[batch_size, dim]`. Testing...
Bumps [pypa/gh-action-pypi-publish](https://github.com/pypa/gh-action-pypi-publish) from 1.8.11 to 1.10.3. Release notes Sourced from pypa/gh-action-pypi-publish's releases. v1.10.3 💅 Cosmetic Output Improvements In #270, @facutuesca💰 made a follow-up to their previous PR #250, making the...