llm-foundry issues

Fine-tune dbrx-instruct on a single VM with 8 H100s

1

I'm trying to fine-tune dbrx on a single machine with 8 H100 gpus. I keep getting OOM error with different configurations, I wonder if this is even doable. I see...

hosseinsarshar

question

Add activation monitor callback option

1

bilal2vec

Installation issue from habana_alpha branch

2

palash04

bug

Add TRT ComposerModel inference wrapper

1

**[WIP] Fix Batching Adds a wrapper, similar to the OpenAI Wrappers in [this PR](https://github.com/mosaicml/llm-foundry/pull/494), for TRT models. The purpose is to be able to evaluate TRT models using our gauntlet,...

nik-mosaic

Setting Dropout in MPT Prefix-LM after Exporting to HuggingFace Crashes during Fine-tuning

2

## Context We have a MPT MoD prefix-lm trained on llm-foundry and then exported to HuggingFace (via your scripts). For some fine-tuning experiments with the HF model, I tried to...

timsteuer

bug

Refactor qa

This PR is stacked on top of the migration PR https://github.com/mosaicml/llm-foundry/pull/936 It does 5 things 1. Refactor CodeEval and QA tasks to have a shared superclass called InContextLearningGenerationTaskDataset 2. Rename...

bmosaicml

Validation

JIRA: https://databricks.atlassian.net/jira/software/c/projects/STR/issues/STR-141?filter=allissues This script is useful in scenarios where the FT API data input has been malformed. It acts as a preventive measure to ensure data integrity and helps in...

XiaohanZhangCMU

Data validation notebook

* add notebook/data_validation_notebook which runs data preparation and token counting from byod/data_validation branch. Merged to main to keep underlying functions up-to-date. * add utils functions used by notebook/data_validation_notebook * shuffle...

XiaohanZhangCMU