Sebastian Raschka

Results 821 comments of Sebastian Raschka

If this PR gets revived some time, we should check out the `qkv_reassemble` function from #1341

Wow thanks for resurrecting it and pushing it forward!

That's fair, we would have to run the script with both fp16 and bf16. But this is not that different from saying "if your GPU does not support `--precision bf16-true`...

If you have all the dependencies installed, that should be supported. You can check out the [tutorials/pretrain_tinyllama.md](https://github.com/Lightning-AI/litgpt/blob/main/tutorials/pretrain_tinyllama.md) tutorial in this repo. Let us know what results you get, I'd be...

Nice, I think in that case we can stay tuned. I wish there was a "snooze" option to hide an issue for like a few months and then get reminded...

That's a good question. We don't have a benchmark but LitGPT already supports FlashAttention-2 via PyTorch's SDPA. The plan is to also support FlashAttention-3 (#1578)

Good question. The number of iterations depends on the batch size. I.e., one epoch means one full pass over the dataset. If you have a smaller batch size this will...

Hey just pinging to see if you are still interested in pursuing this PR. Personally, I think it'd be awesome to support the YI models in LitGPT. There have been...

Thanks for the note. I am not sure if we ever supported MPS devices for pretraining. We can take a look some time, but I don't have a timeline for...

Yes, it should work on CPU devices