Sam Havens comments

Results 32 comments of


                                            Sam Havens

Modifying ALiBi for Encoder-Attention or Cross-Attention

@Arij-Aladel yes, in the original post Ofir says > The T5 model uses no positional information in cross-attention and I would recommend doing the same thing.

How to use the train.py finetuning the pre-trained MPT-7B?

Hi @metacarbon, I have [a PR here](https://github.com/mosaicml/llm-foundry/pull/101) which hopefully addresses the issue you are running into. Basically, `load_path` is for Composer checkpoints; there is a different syntax for models loaded...

Finetuning

Before training starts, are you seeing a warning about some layer weights not being used?

feat: add bw pass on layernorm/rmsnorm

I notice this PR has been open for a while; what's its status?

feat: add bw pass on layernorm/rmsnorm

Not having to deal with CUDA for an RMSnorm kernel is appealing, yeah 😄 it's not high priority currently, but I wanted to make sure to keep tabs on this...

Not an issue, a question - Peft/LoRa finetuning a possibility?

> except that it's applied in a somewhat nonstandard way in the fwd pass of the transformer module @tginart Can you say more about this?

MPT-7B Finetuning Jupyter notebook request

Also ACKing the request for a blog post.

does it work on local machine or someone with limited resources

If you are running transformer models locally without GPU, including MPT, you should probably checkout the GGML project. There is an open PR to add support for MPT: https://github.com/ggerganov/ggml/pull/145

Remote JSONL IFT data

When I run this I am seeing the s3 download fail at 29% with ```sh Downloading ift/jsonl_test: 0%| | 0.00/906k [00:00

GPTQ support for quantization

@hanlint yes, my understanding is that this support would need us to output the attention matrices from the attention module, which is something that can't happen with flash attention; meaning...