llm-foundry
llm-foundry copied to clipboard
LLM training code for Databricks foundation models
Hi! using this docker image `mosaicml/llm-foundry:2.0.1_cu118-latest` I'm training mpt-125m with your default parameters, my loss explodes after some number of steps I have added warmup 2k steps as well It...
Run with amp_fp16: | Benchmark | Subcategory | Accuracy | Number few shot | Model | |:---------------|:------------------------------------|-----------:|------------------:|:----------------| | jeopardy | Average | 0.279767 | 0 | mosaicml/mpt-7b | | |...
Hello, I'm trying to fine-tune MPT-7B starting from `mpt-7b_dolly_sft.yaml`, but I observe that train loss, cross entropy, and perplexity are all fixed at a constant value throughout the entire training...
This fixes a bug in `hf_chat.py` where the custom system prompt and user and assistant format strings were ignored It also cleans up the implementation and does streaming generation
## ❓ Question Hi, I am trying to run zero-shot evaluation for the 30 billion `llama-30b`. Even for a `batch_size = 1`, I am getting a `torch.cuda.OutOfMemoryError: CUDA out of...
The below code: - adds a SharedEmbedding class that let's us get rid of a `F.linear` call. This is necessary with certain wrapping structures (our HF ones), otherwise FSDP emits...
Minor bug, we can't pass devices like `torch.device('cuda:0')` into the autocast funciton. Instead you need to pass `torch.device('cuda:0').type` which is `'cuda'`. Tested on an interactive instance.
Add Replit repo to our main README