trl icon indicating copy to clipboard operation
trl copied to clipboard

Change KTO tokenization to use DPO's

Open kawine opened this issue 1 year ago • 13 comments

What does this PR do?

The tokenization used for KTO has diverged significantly from that of DPO's (e.g., doesn't support images in input, different length truncation techniques, etc.). This PR uses helper functions from DPOTrainer to do the same kind of tokenization in KTOTrainer. Subsequent improvements to the DPO tokenization will now carry over automatically to KTO as well.

This works seem to work better in practice as well, at least with the sample kto dataset.

Screenshot 2024-10-06 at 2 32 41 AM

cc @kashif

kawine avatar Oct 06 '24 06:10 kawine

Some of the changes carry over from https://github.com/huggingface/trl/pull/2153 so maybe it's best to merge that one first?

kawine avatar Oct 06 '24 06:10 kawine

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@kawine the kto test(s) failing...

kashif avatar Oct 06 '24 17:10 kashif

Sorry I forgot to rewrite the tests. Will fix in an hour.

kawine avatar Oct 06 '24 17:10 kawine

@kashif The tests are passing now.

kawine avatar Oct 06 '24 18:10 kawine

very cool! checking!

kashif avatar Oct 06 '24 18:10 kashif

can i add back the maybe_unpair_preference_dataset logic in the example script?

kashif avatar Oct 06 '24 19:10 kashif

Sure!

kawine avatar Oct 06 '24 19:10 kawine

@kashif does this look okay? I merged the latest changes in from main

kawine avatar Oct 09 '24 05:10 kawine

yes, I believe so, the only issue is that the dpo helpers being used somehow smell bad... we need to perhaps make it more modular or simplify it... let me ask around

kashif avatar Oct 09 '24 06:10 kashif

@kashif seems that there are already tokenization helper functions in utils, so I just moved the remaining methods that both DPO and KTO tokenization depend on to utils. This removes the dependency between trainers and makes the code simpler as well -- hopefully that should be fine?

kawine avatar Oct 10 '24 03:10 kawine

Thank you for this PR! I completely agree with your observations. I'm currently working on further refactoring the tokenization phase for DPO (#2209—feel free to contribute, by the way). I suggest putting this PR on hold for now, as the solution might become simpler once we've identified a more straightforward approach for DPO.

qgallouedec avatar Oct 10 '24 09:10 qgallouedec

#2209 is now merged. We would like to do the same refactoring for KTO, if you're still interested in contributing, let us know :)

qgallouedec avatar Oct 24 '24 15:10 qgallouedec