llm-foundry
llm-foundry copied to clipboard
LLM training code for Databricks foundation models
## 🚀 Feature Request I have been investigating how we can make GPTQ work in order to quantize MPT models. It seems that a lot of progress has been made...
Add ActivationMonitor callback from composer
I'm interested in using `llm-foundry` infrastructure for training LLMs for sequence classification/regression tasks. I currently have a fork of `llm-foundry` where I got this working (in a fairly hacky manner...
I followed the tutorial at `train/finetune_example/mpt-7b-arc-easy--gpu.yaml` and added an additional evaluation using `icl_tasks: 'eval/yamls/tasks_light.yaml'` in order to evaluate accuracy on ARC Easy. As the model finetuned, training loss decreased, but...
## Environment ```bash Collecting system information... --------------------------------- System Environment Report Created: 2023-08-21 17:44:51 CST --------------------------------- PyTorch information ------------------- PyTorch version: 2.0.1+cu117 Is debug build: False CUDA used to build PyTorch:...
https://github.com/mosaicml/llm-foundry/blob/bd8127252c660e45ed01413645d29427f86c085a/scripts/data_prep/convert_dataset_json.py#L204C4-L204C4 `out=os.path.join(args.out_root),` should be `out=os.path.join(args.out_root, folder_split),` as in `convert_dataset_hf.py`.
## 🚀 Feature Request unify export script arguments after next version of LLM-Foundry to improve UX, but not immediately so we don't break people's workflows. ## Motivation inference scripts have...
In the [\_\_iter\_\_](https://github.com/mosaicml/llm-foundry/blob/main/llmfoundry/data/data.py#L116) method of the `ConcatTokensDataset` class, the `dtype` argument is not specified for the statement `yield {'tokens': np.asarray(concat_sample).tobytes()}`. The default dtype used by numpy is `np.int32`. On the...
## Console [Eval batch=1/1289] Eval on lambada_openai/0-shot data [Eval batch=130/1289] Eval on lambada_openai/0-shot data [Eval batch=259/1289] Eval on lambada_openai/0-shot data [Eval batch=387/1289] Eval on lambada_openai/0-shot data [Eval batch=516/1289] Eval on...
Finetuning the mpt-7b and mpt-30b using qlora gives the error "ValueError: MPTForCausalLM does not support gradient checkpointing.". Is there a way to fix this?