Connor Holmes issues

Results 8 issues of


                                            Connor Holmes

Extend scratch buffer for long prompts

Segfault fix for long sequence length prompt.

AMD Kernel Compatibility Fixes

Fix missing scale attributes for GPTJ

This PR fixes two regressions introduced in the DeepSpeed chat release for GPT-J: 1. Checks for the `scale` attribute on all parameters before accessing. 2. Changes workspace offsets to avoid...

Explicitly check for OPT activation function

When a HF config is available, this PR changes the OPT policy to explicitly check for the activation function used. This bug was reported in https://github.com/microsoft/DeepSpeed/issues/3263.

OPT Activation Function Hotfix

Incorrect double-equals syntax.

Hybrid Engine Refactor and Llama Inference Support

This PR introduces a number of features and bugfixes: - The Hybrid Engine integration with Containers has been refactored. Models that support the Hybrid Engine now inherit from a feature...

Asymmetric quant algorithm update

Update asymmetric quant to reduce maximum error at the cost of slightly higher average error.

Add Cache to Comm Group

This adds a global cache for creating new comm groups. Rather than returning unique objects, an identical group (same backend, same ranks) will share a single object. The motivation for...