Sebastian Raschka issues

Results 180 issues of


                                            Sebastian Raschka

Add phi-3 checkpoint

- [ ] Verify Phi-3-mini-4k-instruct configs - [ ] Add prompt style - [ ] Add other config files - [ ] Add test_model.py - [ ] Add to test_prompts.py...

Add support for memory-efficient and faster optimizers

Maybe GaLore (#1192) should be changed from `GaloreArgs` to `OptimizerArgs` after all. Then we can also more easily consider other variants such as BAdam (BAdam: A Memory Efficient Full Parameter...

Quizzes report FP but do not report FN when multiple answers are valid

### Discussed in https://github.com/Lightning-AI/dl-fundamentals/discussions/17 Originally posted by **agaldran** January 16, 2023 Just as the title shows, if a valid answer is not marked, this error will not be reported and...

% character when streaming

The streaming works really nicely now using the latest litgpt version from main. The only little issue is that it creates a `%` character e.g., ``` ⚡ ~/streaming python streaming_client.py...

bug

help wanted

Add a detached mode

## 🚀 Feature It would be nice to add a `--detach` mode similar to `jekyll serve` to detach the session. ### Motivation This could be useful for testing purposes and...

enhancement

help wanted

Phi-3 Full finetuning uses more memory as LoRA finetuning

I observed that Phi-3 full finetuning uses less memory than LoRA (see #1553); as discussed something to look into @Andrei-Aksionov

Mistral v0.1 sliding window attention

Opening this issue so we don't forget: Once #1545 is merged, let's also add sliding window attention to Mistral 0.1

enhancement

Add Gemma 2 Checkpoints

Gemma 2 checkpoints are out: https://x.com/clmt/status/1806342399347597589 Haven't had a chance to look into these, but hopefully they are not too different from Gemma 1 in terms of custom architecture components.

enhancement

checkpoints

Upgrade LitData

There has been a new LitData release (v 0.2.7). We need to look at the changes and see how they affect LitGPT.

Add slow interconnect warning

Lots of users asked/raised issues whether there is a bug because multi-GPU training can be slower than single-GPU training. This is not due to a LitGPT bug but because machines...