Nathan Tam

Results 3 comments of Nathan Tam

Just realised the attention mask has been mentioned in this PR, which is the reason I raised this issue #1044

Just updating the title of the PR for clarity. Now KV cache of any generation can be reused for other requests with these changes.

The code in server.py is modified accordingly to adapt to the changes made with `generate_step`. Prompt caching is available on server.py by default.