pentium3

Results 31 comments of pentium3

# summary ## key problem ### workload efficient generative inference for **Transformer models**. (while #256 can be generally applied for all DNN models) large deep models, with tight latency targets...

https://www.usenix.org/conference/osdi22/presentation/yu

https://www.anuragkhandelwal.com/papers/shepherd.pdf

https://dl.acm.org/doi/pdf/10.1145/3600006.3613175

https://assets.amazon.science/4b/ee/9fa14afa47d3bcaa9c54b904daa5/diffusionpipe-training-large-diffusion-models-with-efficient-pipelines.pdf