Carlos Mocholí
Carlos Mocholí
PyTorch has added support for arbitrary custom masks as long which are meant to be performant when used with `torch.compile`: https://github.com/pytorch/pytorch/pull/121845 They are also considering more generic API changes that...
@samsja CUDNN attention is most likely the best option today (see flash attention 3 paper figures) that supports attention masks. xformers is not as competitive on H100s at least.
Our implementation follows most closely the Mistral reference: https://github.com/mistralai/mistral-src/blob/main/moe_one_file_ref.py#L205-L212, please also notice our docstring reference: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/model.py#L335 I cannot comment on why HF chose to follow a different approach. There doesn't...
You are right. It's also missing the clearing of the adapter kv cache. It could be done by overriding https://github.com/Lightning-AI/litgpt/blob/main/litgpt/model.py#L133-L136 for the adapter model A PR with the fixes would...
I use Linux. @Magniveo What's the error you get exactly? Are you able to reproduce the issue in a script that only runs those lines? Do you have proper permissions...
Does ```python import shutil shutil.rmtree(safetensor_path) ``` work?
My bad. I thought `rmtree` worked for files
No concerns. We already set some globals: https://github.com/Lightning-AI/litgpt/blob/wip/litgpt%2F__main__.py#L83-L86
> What if we set the global flag only in our entry point scripts and not at the import level? Unless I'm missing something, this is what we do already...
2\) is what I would prefer. But how does one integrate it into the parser? Is there an extension point in `ActionConfigPath` so that an arbitrary transformation can be applied...