DeepSpeed
DeepSpeed copied to clipboard
Fix missing scale attributes for GPTJ
This PR fixes two regressions introduced in the DeepSpeed chat release for GPT-J:
- Checks for the
scale
attribute on all parameters before accessing. - Changes workspace offsets to avoid scenario where we are double using a buffer and over-writing data.