llm-foundry
llm-foundry copied to clipboard
LLM training code for Databricks foundation models
ran `composer train/train.py train/yamls/pretrain/mpt-3b.yaml` also with `model.fc_type=te` and `precision=amp_fp8` Result: ``` torch: throughput/device/tokens_per_sec: 23.7k te: throughput/device/tokens_per_sec: 23.7k te with fp8: throughput/device/tokens_per_sec: 29.4k ``` Note there does seem to be this...
This PR enables `--device_map auto` which enables using these scripts with very large models that don't fit on a single GPU. This also removes the need for FSDP support @alextrott16...
When I run with an eval set, I only get metrics/eval. I am wondering if there is a way to configure llm-foundry via yaml to also compute loss/eval in the...
Background: PyTorch's DataLoader hangs on several machines (locally, VM, colab) because of the `num_workers` argument being excessive. Generally, when using multiple processes, we want to scale with the number of...
I am using [g5.12xlarge](https://instances.vantage.sh/aws/ec2/g5.12xlarge) instance on AWS with 96 GB of GPU memory. I am attempting to finetune a model on a custom dataset. To accomplish this, I created a...
This is the first in a series of PRs that brings this library into compliance with `pyright`. No functional changes to the code should occur with these fixes. Before: ```...
I'm trying to use `hf_generate.py`, why it's not working with flag `--attn_impl triton`? also changed in `convert_composer_to_hf.py` to `config.attn_config['attn_impl'] = 'triton'` from `torch` ```ValueError: Requirements for `attn_impl: triton` not installed....
## ❓ **Question** i am trying to use mode through Hugging face pipe line but model didn't load, my code line is llm = HuggingFacePipeline.from_model_id(model_id='mosaicml/mpt-7b-instruct',task="text-generation",trust_remote_code=True) ValueError: Loading mosaicml/mpt-7b-instruct requires you...
This PR includes a handful of onboarding/tutorial resources and improvements, with the majority of the change being a new `TUTORIAL.md` file that is meant to provide a more in depth...