Sebastian Raschka
Sebastian Raschka
This separates the single `bias` config into 3 separate bias configs: QKV bias, attention projection bias, and MLP bias. This would be necessary to implement Grok, for example, which uses...
Executing the download command twice will redownload a model: ```bash litgpt download --repo_id microsoft/phi-2 ``` ```bash litgpt download --repo_id microsoft/phi-2 ``` We could check for existing model files and not...
LoRA+
Another interesting improvement idea for our LoRA implementation: > LoRA+: Efficient Low Rank Adaptation of Large Model, [https://arxiv.org/abs/2402.12354](https://arxiv.org/abs/2402.12354) In short, they propose different learning rates for matrices A and B...
Adds the popular and fully open-source OLMo models by Allen AI. - [x] Implement model download - [x] Test tokenizer - [x] Implement HF checkpoint conversion - [x] clean up...
The `pretrain.py` script lists the Alpaca dataset and all other finetuning datasets, but I don't think they are supported for finetuning. E.g., ```bash python litgpt/pretrain.py \ --data litgpt.data.Alpaca2k \ --model_name...
Changing only 1 line the config file, that is ```bash quantize: bnb.nf4 ``` Increased the memory usage from 14 GB -> 18 GB. ``` Epoch 5 | iter 965 step...
One small issue I see with the current config files is that we are using `bf16-true`. This is recommended in my opinion, but certain hardware doesn't support it. In this...
The epoch number is increased in the last line before the training finishes so that it is no longer correct. It's a problem in all finetuning scripts: ``` Epoch 4...
It would be nice to also have some configs for MPS machines (like Ollama does). I can run some small representative models on my Macbook.
Maybe this can be analogous to the LoRA weight merging script usage.