xFasterTransformer
xFasterTransformer copied to clipboard
Master and slaves should both run according to the following workflow: ```Python while True: model.set_input_cb() model.forward_cb() model.free_seqs() ```
```bash # weight only FP16 (input FP32, weight FP16, output FP32) [INFO] First token time: 148.062 ms [INFO] Second token time: 48.3581 ms [INFO] Final output is: ============================================== Once upon...
