nebuly
nebuly copied to clipboard
Implement a new optimizer for TF and torch using FasterTransformer as a backend
Description
FasterTransformer is a library developed by Nvidia specifically for accelerating transformer architecture on Nvidia devices. We should test its performance and implement a conversion framework for converting TF, HF, and Torch models into the FasterTransformer supported objects.
Integration
The FasterTransformer integration will be considered in Speedster as a Compiler and it will have both the PyTorch and TensorFlow interface (Both frameworks are supported by the library). As all compilers the FasterTransformer one will need to support fp16 and int8 computations.
TODO list
- [ ] Implement a PoC using the OS-conversion (or implement our own conversion if it doesn’t exists)
- [ ] Analyse the impact of the feature respect the actual Speedster
- [ ] If a positive impact of the feature is assessed we should implement it as a Compiler in Speedster. Note that when implementing a new Compiler we need to implement its
InferenceLearner
as well.InferenceLearner
s are the Python object we use for wrapping the compiled model and expose an interface similar to the original model. - [ ] Fork the nebullvm repo https://github.com/nebuly-ai/nebullvm
- [ ] Read the Contribution Guidelines
- [ ] Create a PR to main explaining your changes and showing the improvements obtained using FasterTransformer respect the previous version