Nikhil Varghese

Results 7 comments of Nikhil Varghese

I found that it comes from [here](https://github.com/huggingface/alignment-handbook/blob/main/src/alignment/model_utils.py#L72). During initialization, tokenizer does not read the max_length from the model. As a quick hack, I was able to update it to 4096...

@symphonylyh Could share if theres an update on this?

We have been able to use Triton with enc_dec models, so I'm not sure what the difference that and (1) is. We find that the TPS for that implementation is...

@symphonylyh @shannonphu We have been able to use the Flan-T5 with Triton. I believe this is (1). You can reproduce it [here](https://github.com/botitai/T5-TensorRT-LLM). Note that this is much older version of...

Thanks for the update! This is excellent news, I'm sure it was a lot of effort to make it happen.

The default collator for SFTTrainer uses the [DataCollatorForLanguageModeling](https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/data_collator#transformers.DataCollatorForLanguageModeling) objective which finetunes the model on the instruction AND completion by applying MLM loss, whereas the [DataCollatorForCompletionOnlyLM](https://huggingface.co/docs/trl/main/en/sft_trainer#train-on-completions-only) trains exclusively on the completion...

@winglian Agreed! `train_on_inputs: false` is what I really wanted to achieve. Could you point me to where that is actually happening in the code. I couldn't find it myself?