sys_reading
sys_reading copied to clipboard

Published 20 hours ago •

Reame
Issues

FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU

Open pentium3 opened this issue 1 year ago • 1 comments

https://arxiv.org/pdf/2303.06865.pdf

Feb 28 '24 02:02 pentium3

https://proceedings.mlr.press/v202/sheng23a.html

Mar 09 '24 09:03 pentium3