Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

Sample packing for pretraining/fine-tuning

PyTorch has added support for arbitrary custom masks as long which are meant to be performant when used with `torch.compile`: https://github.com/pytorch/pytorch/pull/121845 They are also considering more generic API changes that...

Sample packing for pretraining/fine-tuning

@samsja CUDNN attention is most likely the best option today (see flash attention 3 paper figures) that supports attention masks. xformers is not as competitive on H100s at least.

LlamaMOE. The order of softmax and topK

Our implementation follows most closely the Mistral reference: https://github.com/mistralai/mistral-src/blob/main/moe_one_file_ref.py#L205-L212, please also notice our docstring reference: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/model.py#L335 I cannot comment on why HF chose to follow a different approach. There doesn't...

Is it correct to keep using adapter_kv_cache during training in litgpt/adapter.py?

You are right. It's also missing the clearing of the adapter kv cache. It could be done by overriding https://github.com/Lightning-AI/litgpt/blob/main/litgpt/model.py#L133-L136 for the adapter model A PR with the fixes would...

PermissionError: [WinError 5]

I use Linux. @Magniveo What's the error you get exactly? Are you able to reproduce the issue in a script that only runs those lines? Do you have proper permissions...

PermissionError: [WinError 5]

Does ```python import shutil shutil.rmtree(safetensor_path) ``` work?

PermissionError: [WinError 5]

My bad. I thought `rmtree` worked for files

Instantiation of custom class leads to wrong defaults

No concerns. We already set some globals: https://github.com/Lightning-AI/litgpt/blob/wip/litgpt%2F__main__.py#L83-L86

Instantiation of custom class leads to wrong defaults

> What if we set the global flag only in our entry point scripts and not at the import level? Unless I'm missing something, this is what we do already...

What's the best way to do backwards compatibility for existing configs?

2\) is what I would prefer. But how does one integrate it into the parser? Is there an extension point in `ActionConfigPath` so that an arbitrary transformation can be applied...