aero
aero copied to clipboard
Is it possible to reduce GPU memory usage during inference?
It appears it is taking >30GB of memory for inference. What are the parameters I can set which can reduce the demand?
Thanks!