torchtune
torchtune copied to clipboard
Change default dataset in DPO configs to use HH-RLHF dataset
cc @RdoubleA we'll have to re-benchmark here