Sebastian Raschka
Sebastian Raschka
Thanks for the comment, and I 100% agree. Not sure why I made it unnecessarily complicated there. In my other book (Build an LLM from Scratch), I am using the...
That's a good question. LitGPT uses the `simple_evaluate` function from the LM evaluation harness under the hood: https://github.com/EleutherAI/lm-evaluation-harness/blob/058cfd0eeb022c0bc4862651a3ae08e4e046a106/lm_eval/evaluator.py#L48-L77 I currently don't see how one could override the paths for the...
Hi there, and sorry about that. I think this was a Google Drive link from my university's Google Drive account. However, since I left the university it has since been...
Thanks for sharing. I currently wouldn't have the capacity to add it, but if there is someone interested in adding it, PRs are welcome!
Hi there, thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if...
I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md
That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may...
I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers...
Oh I see, the ` Mixture of Attention heads (MoA)` part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It...
This would currently not be possible without code modification, yet. It would be nice to add support for `train.max_steps` but that's currently not implemented yet: https://github.com/Lightning-AI/litgpt/blob/221b7ef54161272162aa9b036f1ef3674f3160a4/litgpt/pretrain.py#L427