Connor Holmes
Connor Holmes
Segfault fix for long sequence length prompt.
This PR fixes two regressions introduced in the DeepSpeed chat release for GPT-J: 1. Checks for the `scale` attribute on all parameters before accessing. 2. Changes workspace offsets to avoid...
When a HF config is available, this PR changes the OPT policy to explicitly check for the activation function used. This bug was reported in https://github.com/microsoft/DeepSpeed/issues/3263.
Incorrect double-equals syntax.
This PR introduces a number of features and bugfixes: - The Hybrid Engine integration with Containers has been refactored. Models that support the Hybrid Engine now inherit from a feature...
Update asymmetric quant to reduce maximum error at the cost of slightly higher average error.
This adds a global cache for creating new comm groups. Rather than returning unique objects, an identical group (same backend, same ranks) will share a single object. The motivation for...