sneglen

Results 3 comments of sneglen

Thank you for the clarification with the formulas. I better understand now the issue. However I am still a bit puzzled and to me the guidance seems to be conflicting...

I set --mem-fraction-static to 0.9 which seems to be a reasonable value (?) and ended up using a A100 (40GB) which for my case is more than enough for inference....

I had a similar issue with Mistral and a workaround was to **update triton to 2.2.0** from 2.1.0. I found a hint [here](https://github.com/openai/triton/issues/1254#issuecomment-1750089379). It triggers a dependency error where pip...