Sebastian Raschka
Sebastian Raschka
- [ ] Verify Phi-3-mini-4k-instruct configs - [ ] Add prompt style - [ ] Add other config files - [ ] Add test_model.py - [ ] Add to test_prompts.py...
Maybe GaLore (#1192) should be changed from `GaloreArgs` to `OptimizerArgs` after all. Then we can also more easily consider other variants such as BAdam (BAdam: A Memory Efficient Full Parameter...
### Discussed in https://github.com/Lightning-AI/dl-fundamentals/discussions/17 Originally posted by **agaldran** January 16, 2023 Just as the title shows, if a valid answer is not marked, this error will not be reported and...
The streaming works really nicely now using the latest litgpt version from main. The only little issue is that it creates a `%` character e.g., ``` ⚡ ~/streaming python streaming_client.py...
## 🚀 Feature It would be nice to add a `--detach` mode similar to `jekyll serve` to detach the session. ### Motivation This could be useful for testing purposes and...
I observed that Phi-3 full finetuning uses less memory than LoRA (see #1553); as discussed something to look into @Andrei-Aksionov
Opening this issue so we don't forget: Once #1545 is merged, let's also add sliding window attention to Mistral 0.1
Gemma 2 checkpoints are out: https://x.com/clmt/status/1806342399347597589 Haven't had a chance to look into these, but hopefully they are not too different from Gemma 1 in terms of custom architecture components.
There has been a new LitData release (v 0.2.7). We need to look at the changes and see how they affect LitGPT.
Lots of users asked/raised issues whether there is a bug because multi-GPU training can be slower than single-GPU training. This is not due to a LitGPT bug but because machines...