DeepSpeed Hybrid Engine Refactor and Llama Inference Support

Hybrid Engine Refactor and Llama Inference Support

Open cmikeh2 opened this issue 1 year ago • 0 comments

This PR introduces a number of features and bugfixes:

The Hybrid Engine integration with Containers has been refactored. Models that support the Hybrid Engine now inherit from a feature container, either the HybridEngineContainer itself or something more specialized for the particular model architecture.
Llama support for both inference and RLHF training acceleration with Hybrid Engine support
Additional BF16 compilation support
Additional unit test coverage for new operators and data types
Clean up of unused code

May 01 '23 22:05 cmikeh2