llm-foundry issues

Add support for Flex Attention

There are 4 TODOs regarding compiled flex attention that needed to be investigated before checking in. See the tests for more details. TL;DR: - I think sequence lengths which are...

ShashankMosaicML

TransformerEngine attention

## 🚀 Feature Request TransformerEngine has advanced Attention kernels, including support for FlashAttention-3 and low-precision kernels. ## Motivation Having TransformerEngine's Attention as an `attn_impl` option would be super nice due...

janEbert

enhancement

Catch bad split regex

Datasets throws this error: https://github.com/huggingface/datasets/blob/661d7bac29689e2d9eb74fba3d243939d6e9f25b/src/datasets/splits.py#L362 when a split doesn't match the regex. This we catch and throw to the user.

milocress

Add ability to log model instead of saving to folder

1

## Manual Test `test-log-model-no-save-hNNfeX` https://dbc-559ffd80-2bfc.cloud.databricks.com/ml/experiments/482477677751793/runs/ea44b38569974edf8573b3f66558a15f?o=7395834863327820 Two models logged at each batch. The last batch ba10 is registered and ba5 is only logged.

irenedea

Put extra arg test back

With streaming upgraded to 0.9.1, the unit test would lead to inf loop.

XiaohanZhangCMU

[WIP] Add sliding window warmup callback

1

ShashankMosaicML

loss.detach().clone().mean() * (microbatch_size / current_batch_size

4

When I set moe_loss_weight:0 ``` [rank7]: File "/home/syx/miniconda3/envs/lmf/lib/python3.11/site-packages/composer/trainer/trainer.py", line 2907, in [rank7]: **kwargs: self._train_microbatches(microbatches, loss_dict, **kwargs).item(), [rank7]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank7]: File "/home/syx/miniconda3/envs/lmf/lib/python3.11/site-packages/composer/trainer/trainer.py", line 3075, in _train_microbatches [rank7]: microbatch_loss_dict = self._train_microbatch(use_grad_scaling, current_batch_size,...

YixinSong-e

bug

llm-foundry
llm-foundry copied to clipboard

Metadata

Add support for Flex Attention

TransformerEngine attention

Catch bad split regex

Add ability to log model instead of saving to folder

Put extra arg test back

[WIP] Add sliding window warmup callback

loss.detach().clone().mean() * (microbatch_size / current_batch_size

← Metadata

Owner

Metadata

llm-foundry llm-foundry copied to clipboard

Metadata

Add support for Flex Attention

TransformerEngine attention

Catch bad split regex

Add ability to log model instead of saving to folder

Put extra arg test back

[WIP] Add sliding window warmup callback

loss.detach().clone().mean() * (microbatch_size / current_batch_size

← Metadata

Owner

Metadata

llm-foundry
llm-foundry copied to clipboard