Josh Hu issues

Repositories
Issues
Comments

Results 11 issues of


                                            Josh Hu

The process of RLHF and reward modeling

這個模型是從llama2 SFT出來的話，看llama2的論文似乎llama2並沒有經過RLHF(llama2-chat有)，請問Taiwan llama2有經過RLHF的訓練嗎？如果沒有的話，有關繁體中文的對齊，可以使用RLHF來進行，而非SFT。至於comparison的資料集，可以考慮用ChatGPT來產生，這樣不知有沒有試過，謝謝 ![image](https://github.com/MiuLab/Taiwan-LLM/assets/10594453/51e4d586-4a90-4542-8c1d-410a301f52ed)