DeepSpeed
DeepSpeed copied to clipboard
Add RTS and token masking to top-2 gating + configurable jitter epsilon
- Add Random Token Selection to top-2 gating
- Add token masking to top-2 gating
- Add no drop token to top-2 gating
- Add configurable jitter epsilon (both top-1 and top-2)
Hi @awan-10. I am trying to add some missing features to top-2 gating which are currently only available for top-1 gating. Please take a look and let me know what you think.
Can one of the admins verify this patch?
Hi @ykim362 - is this still an issue or a PR you'd like to see completed? If so we can fix the conflicts and review, otherwise we would like to close and clean up some older PRs.
Hi @ykim362 - I'm going to close this PR for now. If this is still something you'd like to merge, we'd be happy to review promptly next time, just re-open and we will take a look. Thanks for contributing to DeepSpeed!