llm-foundry
llm-foundry copied to clipboard
LLM training code for Databricks foundation models
When attempting load a sharded checkpoint, we (@prigoyal and I) hit the following error: ``` 595 │ /usr/lib/python3/dist-packages/composer/utils/checkpoint.py:287 in │ 596 │ load_checkpoint │ 597 │ │ 598 │ 284...
Previously, large files are read entirely at once via `file.read()`. This reads the file and tokenizes in chunks.
Here is code we use to test our benchmark tasks by using a series of progressively more advanced models to see if the benchmarks effectively differentiate between them, and at...
I want to pretrain an LLM with 2T tokens using llm-foundry. But before training, the data processing time is too long. Is there any way to accelerate it?
Adding Big Bench Hard subset as a set of combined CoT tasks, formatted according to the specification in [this repo](https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main). These tasks are quite large and quite slow. I don't...
I'm trying to implement DecoupledLionW_8bit in my fine-tuning script, but I get the following error: > ERROR: Could not find a version that satisfies the requirement mosaicml-turbo=0.0.2; extra == "gpu"...
Current path for streaming of finetuning datasets does not allow for streaming from local path (which works for text datasets out of the box and is also supported by `StreamingFinetuningDataset`...
The model should not be trained to predict the word after the eos_token, because it comes from a different sequence. This PR implements this logic. TODO: Experimental verification.
Implement F1 score for reference-based grading of QA tasks. This PR is dependent on Max's [refactor](https://github.com/mosaicml/composer/pull/2713) added quac, natural questions, and narrative qa Tested mpt-7b-instruct: ``` | Category | Benchmark...
Enable delta table as input for CPT For CPT, you need to provide some tokenizer arguments so the resulted MDS dataset can be written python scripts/data_prep/convert_delta_to_json.py --delta_table_name main.streaming.random_cpt_table --processes 128...