Claudio Montanari
Results
2
comments of
Claudio Montanari
You should be able to disable prefix caching by starting the server with `PREFIX_CACHING=0`. That's how I got the `llama 3.2 vision` models to work.
Hey, based on your logs I think this is expected behavior. The output of your `curl` for `/v1//chat/completions` reports `14` completion tokens. Based on your logs for the 1st request...