Shiyue Xu

Results 10 issues of Shiyue Xu

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 新版本无法加载出界面,去年的老版本可以正常加载 ![32abd75fc4319ab552e183ddd6c834d](https://github.com/hiyouga/LLaMA-Factory/assets/66808901/aff20006-ea09-4864-aa24-121acad40992) ### Expected behavior _No response_ ### System Info _No response_ ### Others...

pending

### System Info SOFTWARE: - transformers==4.39.3 - peft==0.9.0 - accelerate==0.27.2 - torch==1.13.1 HARDWARE: - NVIDIA V100 ### Who can help? @ArthurZucker @muellerzr @SunMarc ### Information - [ ] The official...

trainer

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 命令行方式在自定义的训练集和验证集上均无法正常启动训练 ![image](https://github.com/hiyouga/LLaMA-Factory/assets/66808901/86627a3b-5d04-4963-9f24-04e2f6258ee6) 数据集格式如下: ![image](https://github.com/hiyouga/LLaMA-Factory/assets/66808901/ff872a30-776e-4d79-a80d-b787a5dd199a) `data_info.json`中配置如下: ``` "hh-rlhf-chosen-train": { "file_name": "hh-rlhf-chosen-train.json", "formatting": "sharegpt", "columns": {...

solved

when I already set `mixed_precision: bf16` in acclerate config, should I also set `bf16=True` in training args?

Why compute IPO loss using `average_log_prob=Ture` In function `concatenated_forward`,when the `loss_type` equals 'ipo' the parameter `average_log_prob` will be set True, but according to the loss formula of IPO, the length...

Hello, I want to load the `training_arg.bin` of [zephyr-7b-beta](https://huggingface.co/HuggingFaceH4/zephyr-7b-beta) and pass it to `DPOTrainer` of `trl` to compute the implicit rewards and logps conveniently, but it seems lack some private...

I want to get the logp and reward of the data through `predict`, but the prediction seems only include one data. What is the correct usage of `predict`? ![image](https://github.com/user-attachments/assets/81f441c0-b908-4614-be67-ac5542ecb18b)

I would like to determine if the default configuration `annotators_config=weighted_alpaca_eval_gpt4_turbo` and `reference_outputs=gpt4_turbo ` used when evaluating with alpaca_eval. Thank you!

I upgrade the latest unsloth and train Qwen2.5-3B on GRPO, I set different hyperparams as followes, but according to the information in terminal, these two setting r different? I m...

# 问题描述(例如:错别字) + 页号或公式号 p178 第三段第一句话“不同特征取值范围差异比较大时还会梯度下降法的 搜索效率”应改为:“不同特征取值范围差异比较大时还会【降低】梯度下降法的 搜索效率” - [ ] 错别字:P66 公式2.3 下面一行“1111”改为“2222” (参考示例1) - [ ] 推导错误:P66 公式2.3 中推导错误,。。。 (参考示例2)