Taiwei Shi comments

Results 10 comments of


                                            Taiwei Shi

Question about padding the input sequence

Agree with @srhthu. I think left padding makes more sense, but the [train.py](https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py) used right padding instead. I think the code they use to train Alpaca is simply not correct...

How to plot the pie chart ?

The code to plot the pie chart is [here](https://github.com/yizhongw/self-instruct)

Phi-3-small exploding gradient issue.

When I chat with Phi-3-Small, the model often fails to predict the stop token. Perhaps the chat template for Phi-3-small is wrong? Similar issue can be found here: #4712

Support Generative Reward Model (GenRM)

Instead of using a linear predictor, GenRM leverages CoT and next-token prediction to provide reward. GenRM is proven to be more accurate. https://arxiv.org/abs/2410.12832

Support Generative Reward Model (GenRM)

> > Instead of using a linear predictor, GenRM leverages CoT and next-token prediction to provide reward. GenRM is proven to be more accurate. https://arxiv.org/abs/2410.12832 > > Are there any...

verl v0.2.1 & v0.3 release checklist

Disabling torch.compile is useful, as it can also hang PPO training when enabling use_remove_padding. #387

PPO Training Hangs at Step 0 when use_remove_padding

After some debugging, I found that enabling use_remove_padding for critic does not hang the training. Enabling use_remove_padding for actor does. It hangs at [this line](https://github.com/volcengine/verl/blob/99fb2dde7715da1b37f6137e95daee6890dd7866/verl/workers/actor/dp_actor.py#L103). ![Image](https://github.com/user-attachments/assets/6615bbb9-f396-4182-8dc8-b075a3ca65ba)

PPO Training Hangs at Step 0 when use_remove_padding

After even more debugging, I found that if we modify [```self.compute_entropy_from_logits = torch.compile(verl_F.entropy_from_logits, dynamic=True)```](https://github.com/volcengine/verl/blob/99fb2dde7715da1b37f6137e95daee6890dd7866/verl/workers/actor/dp_actor.py#L56) to ```self.compute_entropy_from_logits = verl_F.entropy_from_logits``` the programs can run with no issues. I also tried setting ```dynamic=False```...

PPO Training Hangs at Step 0 when use_remove_padding

We can now disable torch compile by setting a flag in the config file. #554

Phi-3-small Different Chat Template

This issue might also be the cause of #3881