Sebastian Raschka
Sebastian Raschka
Oh we can keep it open actually, I think it would be a nice thing to add some day. Thanks for raising that!
I'd say we ideally need to add it to the pretrain code (https://github.com/Lightning-AI/litgpt/blob/main/litgpt/pretrain.py) so that it can be used in general with all datasets.
Thanks for reporting this. There are currently a few other issues on my list, but I hope to be able to address this some time.
@awaelchli Thanks! I am fairly certain now that it was an incomplete kv-cache clearing (#1596)
We addressed this in #1596
Just read the paper and came here to suggest it and see you were already faster 😊 What I currently don't understand is why they need to shrink the number...
Thanks for flagging this. I know Mistral is using their own tokenizer, but I could swear this worked before. Something to look into some time.
Good question, intuitively, I'd say that's a good point. @awaelchli what are your thoughts here? I think you have some experience running pretraining on multi-node.
Oh yeah that would be a good idea. I think it might require some other adjustments in other places then as well. It is on my backlog but not sure...
I completely missed this earlier. Many thanks for the fix @d-kleine ! @Arminius4 , after the recent merge, this should now also work via the main branch: ```bash pip install...