Custom backend support.

Open grimoire opened this issue 1 year ago • 0 comments

It is hard to switch kernel implementations in PyTorch Engine, and patching models of transformers makes it difficult for us to carry out more aggressive optimizations.

This PR plan to refactor pytorch engine. We added an operator abstraction layer and made it capable of selecting the most suitable operator backend based on the current context.

lmdeploy/pytorch/layers: The op abstraction layer. Deploy model would be built with these Infrastructure.
lmdeploy/pytorch/backends: Implementation of op would be dispatched here by the device and environments.
cudagraph support, kernel launch will not be the main bottleneck.

Jul 22 '24 13:07 grimoire