Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

Add TinyStories to the pretraining docs

Adrian suggests that this is done together with a Studio that includes the pretokenized data

LoRA model tokenizer configuration fails to load

What tokenizer config from huggingface are you trying to load?

LoRA model tokenizer configuration fails to load

When you finetune, you load existing Hugging Face hub weights/tokenizer. LitGPT then copies over the tokenizer into your finetuned output so that it can be loaded in subsequent steps. Did...

LoRA model tokenizer configuration fails to load

Just saw your last message. Looks like it's treating it as a HF tokenizer instead of SentencePiece tokenizer, so this line must be resolving to `False`: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/tokenizer.py#L21

LoRA model tokenizer configuration fails to load

Which --checkpoint_dir did you use with LoRA? I can try to follow the same steps you did to see if I end up with the same error

QLoRA with FSDP Support

Surely. It will need to be supported in Lightning first though. @awaelchli already had a look but there are some technical limitations to overcome

Gradients in GPT module of the finetuning/lora.py script are always zero

> I'm using just one gpu, so I'm initializing the fabric object with In this case, `empty_init=False` is used: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/finetune/lora.py#L170 so initialization should be happening normally

Qwen support

After a brief skim over the HF implementation, I don't see any blockers to support it. Contributions are welcome!

Sample packing for pretraining/fine-tuning

cc @awaelchli, if you'd like to answer

Sample packing for pretraining/fine-tuning

> if a document, article, instruction/output pair exceeds the max sequence length, how is it treated? Depends on the data preparation, but our scripts trim it: https://github.com/Lightning-AI/lit-gpt/blob/0791c52a944f022a5cee91ed1e47288830efb72c/scripts/prepare_alpaca.py#L116-L117 > What about...