nebuly icon indicating copy to clipboard operation
nebuly copied to clipboard

Implement a new optimizer for TF and torch using FasterTransformer as a backend

Open diegofiori opened this issue 2 years ago • 0 comments

Description

FasterTransformer is a library developed by Nvidia specifically for accelerating transformer architecture on Nvidia devices. We should test its performance and implement a conversion framework for converting TF, HF, and Torch models into the FasterTransformer supported objects.

Integration

The FasterTransformer integration will be considered in Speedster as a Compiler and it will have both the PyTorch and TensorFlow interface (Both frameworks are supported by the library). As all compilers the FasterTransformer one will need to support fp16 and int8 computations.

TODO list

  • [ ] Implement a PoC using the OS-conversion (or implement our own conversion if it doesn’t exists)
  • [ ] Analyse the impact of the feature respect the actual Speedster
  • [ ] If a positive impact of the feature is assessed we should implement it as a Compiler in Speedster. Note that when implementing a new Compiler we need to implement its InferenceLearner as well. InferenceLearners are the Python object we use for wrapping the compiled model and expose an interface similar to the original model.
  • [ ] Fork the nebullvm repo https://github.com/nebuly-ai/nebullvm
  • [ ] Read the Contribution Guidelines
  • [ ] Create a PR to main explaining your changes and showing the improvements obtained using FasterTransformer respect the previous version

Resources:

FasterTransformer Library

diegofiori avatar Jan 16 '23 12:01 diegofiori