Sebastian Raschka comments

Results 818 comments of


                                            Sebastian Raschka

Confusing matrix operations in Chpt. 16

Thanks for the comment, and I 100% agree. Not sure why I made it unnecessarily complicated there. In my other book (Build an LLM from Scratch), I am using the...

how to change dataset path or download url when evaluating

That's a good question. LitGPT uses the `simple_evaluate` function from the LM evaluation harness under the hood: https://github.com/EleutherAI/lm-evaluation-harness/blob/058cfd0eeb022c0bc4862651a3ae08e4e046a106/lm_eval/evaluator.py#L48-L77 I currently don't see how one could override the paths for the...

Pretrained weight

Hi there, and sorry about that. I think this was a Google Drive link from my university's Google Drive account. However, since I left the university it has since been...

Is Support for the DeepSeek v2.5 model on the roadmap?

Thanks for sharing. I currently wouldn't have the capacity to add it, but if there is someone interested in adding it, PRs are welcome!

Support a new model

Hi there, thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if...

Support a new model

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Support a new model

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may...

Support a new model

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers...

Support a new model

Oh I see, the ` Mixture of Attention heads (MoA)` part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It...

How to set max_iters

This would currently not be possible without code modification, yet. It would be nice to add support for `train.max_steps` but that's currently not implemented yet: https://github.com/Lightning-AI/litgpt/blob/221b7ef54161272162aa9b036f1ef3674f3160a4/litgpt/pretrain.py#L427