LMFlow
LMFlow copied to clipboard
[Feature] Iterative DPO
With tons of experiments and tests, we finally support iterative dpo within a python script. Other useful features come alongside with iterative dpo:
- Multi instance vllm inference (using ray)
- Multi instance rm inference (also using ray)
- Sample from a lmflow dataset, train test split a lmflow dataset