Nathan Tam
Results
3
comments of
Nathan Tam
Just realised the attention mask has been mentioned in this PR, which is the reason I raised this issue #1044
Just updating the title of the PR for clarity. Now KV cache of any generation can be reused for other requests with these changes.
The code in server.py is modified accordingly to adapt to the changes made with `generate_step`. Prompt caching is available on server.py by default.