Sebastian Raschka issues

Results 184 issues of


                                            Sebastian Raschka

Separate out the biases

This separates the single `bias` config into 3 separate bias configs: QKV bias, attention projection bias, and MLP bias. This would be necessary to implement Grok, for example, which uses...

Don't redownload files by default

Executing the download command twice will redownload a model: ```bash litgpt download --repo_id microsoft/phi-2 ``` ```bash litgpt download --repo_id microsoft/phi-2 ``` We could check for existing model files and not...

enhancement

LoRA+

Another interesting improvement idea for our LoRA implementation: > LoRA+: Efficient Low Rank Adaptation of Large Model, [https://arxiv.org/abs/2402.12354](https://arxiv.org/abs/2402.12354) In short, they propose different learning rates for matrices A and B...

enhancement

fine-tuning

Add OLMo: 1B & 7B

Adds the popular and fully open-source OLMo models by Allen AI. - [x] Implement model download - [x] Test tokenizer - [x] Implement HF checkpoint conversion - [x] clean up...

model-weights

Exclude finetuning datasets from the `pretrain.py` arguments

The `pretrain.py` script lists the Alpaca dataset and all other finetuning datasets, but I don't think they are supported for finetuning. E.g., ```bash python litgpt/pretrain.py \ --data litgpt.data.Alpaca2k \ --model_name...

Higher memory use with QLoRA

Changing only 1 line the config file, that is ```bash quantize: bnb.nf4 ``` Increased the memory usage from 14 GB -> 18 GB. ``` Epoch 5 | iter 965 step...

bug

Auto precision

One small issue I see with the current config files is that we are using `bf16-true`. This is recommended in my opinion, but certain hardware doesn't support it. In this...

Wrong epoch number on last line

The epoch number is increased in the last line before the training finishes so that it is no longer correct. It's a problem in all finetuning scripts: ``` Epoch 4...

Add MPS configs

It would be nice to also have some configs for MPS machines (like Ollama does). I can run some small representative models on my Macbook.

Support model merging

Maybe this can be analogous to the LoRA weight merging script usage.

enhancement