Shiyue Xu
Shiyue Xu
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 新版本无法加载出界面,去年的老版本可以正常加载  ### Expected behavior _No response_ ### System Info _No response_ ### Others...
### System Info SOFTWARE: - transformers==4.39.3 - peft==0.9.0 - accelerate==0.27.2 - torch==1.13.1 HARDWARE: - NVIDIA V100 ### Who can help? @ArthurZucker @muellerzr @SunMarc ### Information - [ ] The official...
### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 命令行方式在自定义的训练集和验证集上均无法正常启动训练  数据集格式如下:  `data_info.json`中配置如下: ``` "hh-rlhf-chosen-train": { "file_name": "hh-rlhf-chosen-train.json", "formatting": "sharegpt", "columns": {...
when I already set `mixed_precision: bf16` in acclerate config, should I also set `bf16=True` in training args?
Why compute IPO loss using `average_log_prob=Ture` In function `concatenated_forward`,when the `loss_type` equals 'ipo' the parameter `average_log_prob` will be set True, but according to the loss formula of IPO, the length...
Hello, I want to load the `training_arg.bin` of [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) and pass it to `DPOTrainer` of `trl` to compute the implicit rewards and logps conveniently, but it seems lack some private...
I want to get the logp and reward of the data through `predict`, but the prediction seems only include one data. What is the correct usage of `predict`? 
I would like to determine if the default configuration `annotators_config=weighted_alpaca_eval_gpt4_turbo` and `reference_outputs=gpt4_turbo ` used when evaluating with alpaca_eval. Thank you!
I upgrade the latest unsloth and train Qwen2.5-3B on GRPO, I set different hyperparams as followes, but according to the information in terminal, these two setting r different? I m...
# 问题描述(例如:错别字) + 页号或公式号 p178 第三段第一句话“不同特征取值范围差异比较大时还会梯度下降法的 搜索效率”应改为:“不同特征取值范围差异比较大时还会【降低】梯度下降法的 搜索效率” - [ ] 错别字:P66 公式2.3 下面一行“1111”改为“2222” (参考示例1) - [ ] 推导错误:P66 公式2.3 中推导错误,。。。 (参考示例2)