LLMs-from-scratch
LLMs-from-scratch copied to clipboard
Fix bug in masking when kv cache is used.
Thank you for creating this project, I learned a lot from it!
There seems to be a small bug during masking when kv cache is enabled:
- W/o kv cache,
mask_bool = self.mask.bool()[:num_tokens, :num_tokens]yields to intended results. - W/ kv cache,
num_tokenswould be set to 1, andmask_boolwould be a tensor of shape (1, 1). However, we want themask_boolto be a tensor of shape (1, num_tokens_K).
The following changes address this bug.