sneglen
sneglen
Thank you for the clarification with the formulas. I better understand now the issue. However I am still a bit puzzled and to me the guidance seems to be conflicting...
I set --mem-fraction-static to 0.9 which seems to be a reasonable value (?) and ended up using a A100 (40GB) which for my case is more than enough for inference....
I had a similar issue with Mistral and a workaround was to **update triton to 2.2.0** from 2.1.0. I found a hint [here](https://github.com/openai/triton/issues/1254#issuecomment-1750089379). It triggers a dependency error where pip...