Flish Wang comments

Results 6 comments of


                                            Flish Wang

Segmentation fault when DataLoader processes are launched after compiling Triton kernels

I also met this bug, and created a issue on the pytorch side.

Segmentation fault when DataLoader processes are launched after compiling Triton kernels

Some workaround methods that may work: - Decorate at least one **forward** function with torch.compile of the model **before** the triton kernel called. The more compiled functions there are, the...

torch.roll converts nn.Parameter into regular torch.Tensor

Paramters generally should not be directly changed in the forward pass. For best practise, you may use self.register_buffer instead. Anyway, if you really want to change the data in a...

nn.Module.to(memory_format= channels_last format) failed if containing 5D parameters

> Is what you're looking for `a=A().to(memory_format=torch.channels_last_3d)` Nope. The model is for 2d pictures and the 5d param is something like tokens/masks/attn biases/position embeds.

[Bug] CUDA Graph Capture Fail on H200

> Root cause: torch.compile maybe incompatible torch.cuda.is_current_stream_capturing() Ref: https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/deepseek_v2.py#L715 > > Success: `python3 -m sglang.launch_server --model /DeepSeek-V3 --tp 8 --trust-remote-code --mem-fraction-static 0.7 --cuda-graph-max-bs 16` > > Failed: `python3 -m sglang.launch_server...

请教一下，GRPO训练如何并行使用奖励模型或者生成式奖励模型？是否有简单的示例

> 我还有一个问题，计算reward的时候好像是串行的，这时候显卡利用率为0，而等genRM打分要很久，有没有办法或者示例是异步进行的，例如rollout一条genRM同时打分一条，不需要等到rollout完再打分 The async rollout server in the latest verl seems to be able to compute_reward just after each rollout process finished. Maybe you can call genRM server during compute_reward...