Claudio Montanari

Results 2 comments of Claudio Montanari

You should be able to disable prefix caching by starting the server with `PREFIX_CACHING=0`. That's how I got the `llama 3.2 vision` models to work.

Hey, based on your logs I think this is expected behavior. The output of your `curl` for `/v1//chat/completions` reports `14` completion tokens. Based on your logs for the 1st request...