Carlos Mocholí

Results 427 comments of Carlos Mocholí

Hi friends. You should be able to check this by jitting a model with the `torch.compile` executor enabled, making sure that `fullgraph=True` is set (disallowing recompiles). It should be an...

Hey Seb! @nikitaved Just merged a PR to improve the messaging here: #78 The TLDR is that you want to run `examine` on the model to get a report of...

> But it's easily fixable. We can preallocate kv-cache for the first turn in the same fashion as in the generate script and then, if in the current turn the...

> It would be good to include it so that people are more aware of a recommended way to do this. Fully agree. > am not convinced that it is...

Sorry, this is not implemented at the moment for simplicity in understanding the generation code, (it's inherited from nanoGPT)

I'm not 100% familiar with the advantages of left vs right so if one of you has a good resource on this, I'd appreciate it if you could share it

From what I understand, right padding will not require creating an attention mask (so you can keep using flash attention), but then one cannot simply `-1` here: https://github.com/Lightning-AI/lit-gpt/blob/main/generate/base.py#L62