aero Is it possible to reduce GPU memory usage during inference?

Is it possible to reduce GPU memory usage during inference?

Open tshmak opened this issue 1 year ago • 0 comments

It appears it is taking >30GB of memory for inference. What are the parameters I can set which can reduce the demand?

Thanks!

Nov 23 '23 06:11 tshmak