Sebastian Raschka

Results 818 comments of Sebastian Raschka

Yes, I think so. But one last question, after the update, have you double-checked / tested it on custom paths that don't start with `"checkpoints", like ```bash litgpt chat my_custom_dir/google/gemma-2-9b-it...

Nice, thanks for checking! Looks all good to me now :)

Good call. I can take care of these next week

Good point. Thanks again for the contribution!

Good point. I think the main thing here is that if you have large amounts of texts, you would store it in a compressed or pretokenized format, and perhaps also...

Thanks for reporting, and hm, yes, this is weird. I can reproduce it: ### Pretraining ```bash litgpt pretrain \ --model_name pythia-14m \ --tokenizer_dir checkpoints/EleutherAI/pythia-14m \ --out_dir my_test_dir \ --data TextFiles...

Hi there, do you remember what the output was before the conversion? It would be useful to know to make sure that it was trained well.

I would be open to adding these models. If it helps, I've recently written a how-to guide here: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Hi there, could you try this with a very small text example that only consists of a few entries, e.g., repeated versions of the entry you showed: ```json [ {...