What does this PR do?

The tokenization used for KTO has diverged significantly from that of DPO's (e.g., doesn't support images in input, different length truncation techniques, etc.). This PR uses helper functions from DPOTrainer to do the same kind of tokenization in KTOTrainer. Subsequent improvements to the DPO tokenization will now carry over automatically to KTO as well.

This works seem to work better in practice as well, at least with the sample kto dataset.

cc @kashif

Oct 06 '24 06:10 kawine

Some of the changes carry over from https://github.com/huggingface/trl/pull/2153 so maybe it's best to merge that one first?

Oct 06 '24 06:10 kawine

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Oct 06 '24 09:10 HuggingFaceDocBuilderDev

@kawine the kto test(s) failing...

Oct 06 '24 17:10 kashif

Sorry I forgot to rewrite the tests. Will fix in an hour.

Oct 06 '24 17:10 kawine

@kashif The tests are passing now.

Oct 06 '24 18:10 kawine

very cool! checking!

Oct 06 '24 18:10 kashif

can i add back the maybe_unpair_preference_dataset logic in the example script?

Oct 06 '24 19:10 kashif

Sure!

Oct 06 '24 19:10 kawine

@kashif does this look okay? I merged the latest changes in from main

Oct 09 '24 05:10 kawine

yes, I believe so, the only issue is that the dpo helpers being used somehow smell bad... we need to perhaps make it more modular or simplify it... let me ask around

Oct 09 '24 06:10 kashif

@kashif seems that there are already tokenization helper functions in utils, so I just moved the remaining methods that both DPO and KTO tokenization depend on to utils. This removes the dependency between trainers and makes the code simpler as well -- hopefully that should be fine?

Oct 10 '24 03:10 kawine

Thank you for this PR! I completely agree with your observations. I'm currently working on further refactoring the tokenization phase for DPO (#2209—feel free to contribute, by the way). I suggest putting this PR on hold for now, as the solution might become simpler once we've identified a more straightforward approach for DPO.

Oct 10 '24 09:10 qgallouedec

#2209 is now merged. We would like to do the same refactoring for KTO, if you're still interested in contributing, let us know :)

Oct 24 '24 15:10 qgallouedec

Change KTO tokenization to use DPO's

What does this PR do?