ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Implement triton kernels for inference

Open yuanheng-zhao opened this issue 1 year ago • 0 comments

Tracking for implementation of triton kernels compatible with relevant submodules and KVCache for inference.

  • Context-stage Attention https://github.com/hpcaitech/ColossalAI/pull/5192
  • Decoding-stage Attention
  • Pos Embedding
    • https://github.com/hpcaitech/ColossalAI/pull/5181
  • KVCache Copy

yuanheng-zhao avatar Dec 19 '23 03:12 yuanheng-zhao