FasterTransformer
FasterTransformer copied to clipboard
Limit cuda memory growth
hi,When I deployed the LLama model of 7B, I found that Cuda memory has been growing without limit on the A40. I wonder if FasterTransformer has any means to limit cuda memory growth? I used tritonserver deployment,and used batch=64, used the default parameters. thx......