pyllama icon indicating copy to clipboard operation
pyllama copied to clipboard

a questuon about the single GPU Inference

Open TitleZ99 opened this issue 1 year ago • 1 comments

Thanks for this great job and i'm wondering how to run inference in a 8GB single GPU,like your example showing in the readme. I tried it in my RTX2080ti with 11GB and the result is CUDA out of memory.

TitleZ99 avatar Apr 07 '23 08:04 TitleZ99