August Moharrami comments

Results 7 comments of


                                            August Moharrami

[Data] Implement dataset mixer for combining datasets in training

@lewtun Shouldn't this be a part of Datasets instead? Datasets already has [Interleaves](https://huggingface.co/docs/datasets/v3.0.1/en/process#interleave), which also mixes datasets in a similar way. It's not quite integrable as it is, but may...

add support for closed source model for Generalized Knowledge Distillation Trainer

> since the logits/dictionary needs to match between the teacher and student model, I do not thinks possible to train with closed models Anthropic API doesn't output any logits or...

Add model merging callback

@lewtun After reading the paper, I noticed that the DPO checkpoints were combined with a different model rather than the reference model used in DPO training. So, I added an...

Add model merging callback

@coding-famer The callback has an optional parameter called `merge_at_every_checkpoint`, which merges the saved checkpoint at either every step or at the end of each epoch during training.

Has anyone face with problems that DPO rewards accuracy stuck at 0.5 and the loss stuck at 0.6 to 0.8?

This could go up on r/programmerhorror Sir, I too struggled to understand the problem. From what I gathered, I have to ask: Why would you group similar prompts together? There's...

Drop GPT2 in our test in favour of a more recent instruct model

@qgallouedec it's mostly `dummy-GPT2-correct-vocab` Do you want to replace those too?

SFT_vlm script is missing the chat template

Could you please clarify what's needed here? It seems that none of the example scripts currently take `chat_template` as input.