Nikhil Varghese comments

Results 7 comments of


                                            Nikhil Varghese

Tokenizer model_max_length

I found that it comes from [here](https://github.com/huggingface/alignment-handbook/blob/main/src/alignment/model_utils.py#L72). During initialization, tokenizer does not read the max_length from the model. As a quick hack, I was able to update it to 4096...

enc-dec triton backend support

@symphonylyh Could share if theres an update on this?

enc-dec triton backend support

We have been able to use Triton with enc_dec models, so I'm not sure what the difference that and (1) is. We find that the TPS for that implementation is...

enc-dec triton backend support

@symphonylyh @shannonphu We have been able to use the Flan-T5 with Triton. I believe this is (1). You can reproduce it [here](https://github.com/botitai/T5-TensorRT-LLM). Note that this is much older version of...

enc-dec triton backend support

Thanks for the update! This is excellent news, I'm sure it was a lot of effort to make it happen.

Allow usage of DataCollatorForCompletionOnlyLM for SFT Training

The default collator for SFTTrainer uses the [DataCollatorForLanguageModeling](https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling) objective which finetunes the model on the instruction AND completion by applying MLM loss, whereas the [DataCollatorForCompletionOnlyLM](https://huggingface.co/docs/trl/main/en/sft_trainer#train-on-completions-only) trains exclusively on the completion...

Allow usage of DataCollatorForCompletionOnlyLM for SFT Training

@winglian Agreed! `train_on_inputs: false` is what I really wanted to achieve. Could you point me to where that is actually happening in the code. I couldn't find it myself?