GeorgiosSmyrnis
GeorgiosSmyrnis
This adds an option that allows for predownloading of data at the start of each checkpoint to local storage. This helps with potential s3 errors.
This adds a unit test for mixing sources, both with and without sampling. This also fixes the naming scheme within tars, which could cause issues if two sequences within the...
This PR adds the capability of checkpoint skipping if needed - useful if resuming and want to skip some batches.
This PR changes the GeGLU MLP and adds support for MQA.
This enables mixing of pretokenized data with the tokenize_shuffle.py script. This is allowed by the `--pretok_tars` flag, which assumes that the tarfiles that the script contain already tokenized data.
This adds a flag that stops attention from going across documents, identified by the EOT token. The loss for the token right after the EOT token is ignored. TODO: add...
This PR does the following: - Consolidates the instructions on how to run tokenization in a single README. - Adds a sample script on how to run tokenization on a...
Sometimes, the model needs to do a few more training steps in a new epoch, and it would load an entire checkpoints worth of data for that. This PR limits...
`group_by_keys_nothrow` breaks with `webdataset>=0.2.90`.