llm-foundry
llm-foundry copied to clipboard
LLM training code for Databricks foundation models
For example, if I'm training on 2 nodes, should I have checkpoints both 0 and 1 rank? I have `save_filename: ep{epoch}-ba{batch}-rank{rank}.pt` But checkpoints saving only for node 0 with rank...
I am using this code to fine tune https://colab.research.google.com/drive/1DqKNPOzyMUXmJiJFvJITOahVDxCrA-wA#scrollTo=wShqQoppuv-h and I get this error, I use the G5 Ec2 instance machine **(A10)**. In the code above when I change torch.bfloat16...
Hi, I want to finetune MPT-7b and I get OOM error. This is what I run: `python ./llm-foundry/scripts/train/train.py ./llm-foundry/scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml train_loader.dataset.split=train eval_loader.dataset.split=test ` I have changed the seq_length to 512. And...
I am unable to convert fine tune results to the 🤗 format for inference. Here's an example where I am able to do a simple fine tune using the t5-small_dolly_sft.yaml...
## 🚀 Feature Request In `StreamingTextDataset` the `_read_binary_tokenized_sample()` method assumes the data is a numpy array of type `np.int64`. The default type should be `np.int32`, and the user should be...
Could you add some notes to the Docker section on how to run inference? I tried ```sh docker run -it --gpus all mosaicml/llm-foundry:2.0.1_cu118-latest bash ``` But there doesn't seem to...
Hi I have downloaded a docker image, and i am trying to train a model, but i am running into an issue. I get 2 warnings and 1 error. The...
The web demo on Hugging Face isn't working. There seems to be an OOM error on loading the model. ## Environment Web demo on Hugging Face. ## To reproduce Got...
Confirmed fp16 is slightly better than bf16. I also edited the eval script to be compute averages across benchmarks with sub scores and log the table results in markdown format....
Some custom FC layers will need custom kwargs. This PR enables that by changing `fc_type` from `str` to `Union[str, Dict]`, and converting it to dict thereafter. Default configs have also...