ColossalAI
ColossalAI copied to clipboard
Implement triton kernels for inference
Tracking for implementation of triton kernels compatible with relevant submodules and KVCache for inference.
- Context-stage Attention https://github.com/hpcaitech/ColossalAI/pull/5192
- Decoding-stage Attention
- Pos Embedding
- https://github.com/hpcaitech/ColossalAI/pull/5181
- KVCache Copy