Sebastian Raschka
Sebastian Raschka
Yes, I think so. But one last question, after the update, have you double-checked / tested it on custom paths that don't start with `"checkpoints", like ```bash litgpt chat my_custom_dir/google/gemma-2-9b-it...
Nice, thanks for checking! Looks all good to me now :)
Good call. I can take care of these next week
Good point, I agree
Good point. Thanks again for the contribution!
Good point. I think the main thing here is that if you have large amounts of texts, you would store it in a compressed or pretokenized format, and perhaps also...
Thanks for reporting, and hm, yes, this is weird. I can reproduce it: ### Pretraining ```bash litgpt pretrain \ --model_name pythia-14m \ --tokenizer_dir checkpoints/EleutherAI/pythia-14m \ --out_dir my_test_dir \ --data TextFiles...
Hi there, do you remember what the output was before the conversion? It would be useful to know to make sure that it was trained well.
I would be open to adding these models. If it helps, I've recently written a how-to guide here: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md
Hi there, could you try this with a very small text example that only consists of a few entries, e.g., repeated versions of the entry you showed: ```json [ {...