sys_reading
sys_reading copied to clipboard
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU
https://arxiv.org/pdf/2303.06865.pdf
https://proceedings.mlr.press/v202/sheng23a.html