sys_reading icon indicating copy to clipboard operation
sys_reading copied to clipboard

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Open pentium3 opened this issue 1 year ago • 1 comments

https://arxiv.org/pdf/2303.06865.pdf

pentium3 avatar Feb 28 '24 02:02 pentium3

https://proceedings.mlr.press/v202/sheng23a.html

pentium3 avatar Mar 09 '24 09:03 pentium3