llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

LLM training code for Databricks foundation models

Results 267 llm-foundry issues
Sort by recently updated
recently updated
newest added

Just a simple script to generate eval data in the right format for addition samples

Hi, I want to test model's inference on my hardware. I am using a **A100** single instance GPU with 60Gb memory. I have created 4 processes (and 4 model instances,...

question

This PR adds torch 2.0 based tensor parallel support for the ffn block. It's ported over from https://github.com/mosaicml/examples/pull/255 Currently the trained weights don't match between parallel/no-parallel versions even in a...

This PR refactors the logging to: * centralize verbosity controls into the python logging levels, instead of passing `verbose` arguments to many helper methods * downgrades most warnings to info...

Hi MosaicML. AutoGPTQ is a package trying to provide support for quantizing various LLMs. However, to do so, a few requirements are needed. Here are a few issues: - MPTForCausalLM...

Got this error, which finetuning instruct model 8XA100 machine ``` ERROR: expected to be in states [] but current state is TrainingState_.BACKWARD_PRE File "/usr/local/lib/python3.10/dist-packages/composer/core/engine.py", line 526, in _close callback.close(state, logger)...

I got the below when finetuning with [mpt-7b_dolly_sft.yaml](https://github.com/mosaicml/llm-foundry/blob/main/scripts/train/yamls/finetune/mpt-7b_dolly_sft.yaml) Dataset: [mosaicml/dolly_hhrlhf](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) ``` [E ProcessGroupNCCL.cpp:828] [Rank 7] Watchdog caught collective operation timeout: WorkNCCL(SeqNum=1, OpType=BROADCAST, Timeout(ms)=600000) ran for 606142 milliseconds before timing out....

Hi, I am trying to finetune the MPT-7B model using a local dataset on 2 A100 - 80GB GPUs. Below is the complete log. Torch Version: 1.13.1+cu117 Appreciate any help...

Hi, I am trying to reproduce your zero-shot evals from the Table 1 in the blog: https://www.mosaicml.com/blog/mpt-7b but the numbers I am seeing are much worse than the ones reported...

# G-Eval This is an implementation of G-Eval, which uses GPT4 as a way to judge model outputs without a ground truth (some groups have started using GPT3.5 as a...