Gong Zi comments

Results 10 comments of


                                            Gong Zi

model type

We apologize for the delayed response. To address your issue, please try the following: - Use 'qwen' as the `model_type` to load models based on Qwen and Qwen1.5 versions. -...

We apologize for the delayed response. For evaluating the HumanEval metric, as well as a variety of other code-related metrics, we recommend referring to the codefuse-evaluation repository (https://github.com/codefuse-ai/codefuse-evaluation).

How can i do continue pretraining?

We apologize for the delayed response. To address your issue, please follow these steps: 1. First, use the `run_offline_tokenization.sh` script to tokenize your data. 2. Then, make the following modifications...

请问CoBa论文中，训练参数是如何设置的

We apologize for the delayed response. We have provided detailed explanations for each CoBa parameter, along with recommended settings, in the "CoBa Arguments Configuration" section of the mftcoder_accelerate README. These...

Gradient Accumulation Error with Ulysses Sequence Parallel Causes Inconsistent Loss

> [#2919](https://github.com/volcengine/verl/issues/2919) - Suggested a fix in an issue which I raised. Maybe that would fix your issue Thanks for your reply! I have tried passing `n_micro_batches` as a parameter...

Gradient Accumulation Error with Ulysses Sequence Parallel Causes Inconsistent Loss

> Hope you removed the division by n micro batches in below line too for step loss logging > > ``` > for micro_batch in micro_batches: > loss = self._compute_loss_and_backward(batch=micro_batch,...

Gradient Accumulation Error with Ulysses Sequence Parallel Causes Inconsistent Loss

Yup, I noticed this before and tried it, but the loss was still inconsistent. But I didn't try it with performing `loss /= n_micro_batches` before `loss.backward()`. I'll try it again....

Gradient Accumulation Error with Ulysses Sequence Parallel Causes Inconsistent Loss

But I think we should keep `grad_scaler=True`, because the loss of experiment **sft-qwen3-32b-lr5e-5-32k-gpu64-bsz32-sp2** is correct.

Gradient Accumulation Error with Ulysses Sequence Parallel Causes Inconsistent Loss

Hi, thanks for the follow-up. After the ablation runs, I’m afraid I haven’t been able to pinpoint the root cause for the inconsistent loss.

基于qwen模型使用coba训练后，权重合并错误

This issue may not be related to which MFT loss was used. It's possible that the problem stems from an incorrect setting of the model_type (qwen or qwen2). Could you...