Results 2 issues of [email protected]

### What happened? Hi, when stress testing llama-server (--parallel 3, prompt="Count 1 to 10000 in words") and running deepseek-coder-v2:16b-lite-instruct-q8_0 i got this assertion error in the logs and everything stopped...

bug-unconfirmed
medium severity

### Description Hello, i am testing a port of my agentic app to Pydantic AI, and i would like to request easy support prompt caching ["cache-control"] = {"type": "ephemeral"} for...