DeepSpeed
DeepSpeed copied to clipboard
Hybrid Engine Refactor and Llama Inference Support
This PR introduces a number of features and bugfixes:
- The Hybrid Engine integration with Containers has been refactored. Models that support the Hybrid Engine now inherit from a feature container, either the
HybridEngineContainer
itself or something more specialized for the particular model architecture. - Llama support for both inference and RLHF training acceleration with Hybrid Engine support
- Additional BF16 compilation support
- Additional unit test coverage for new operators and data types
- Clean up of unused code