FasterTransformer icon indicating copy to clipboard operation
FasterTransformer copied to clipboard

Limit cuda memory growth

Open coderchem opened this issue 2 years ago • 0 comments

hi,When I deployed the LLama model of 7B, I found that Cuda memory has been growing without limit on the A40. I wonder if FasterTransformer has any means to limit cuda memory growth? I used tritonserver deployment,and used batch=64, used the default parameters. thx......

coderchem avatar Aug 09 '23 06:08 coderchem