Guangming Sheng comments

Results 70 comments of


                                            Guangming Sheng

FSDP model Parallel

@AIRobotZhang In FSDP, ShardingStrategy.FULL_SHARD is ZeRO3

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling?

Hi @accupham , thanks for your questions! > So what would be the best way to add in dynamic function calling? Hook the [generate method of vLLM's LLM class](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L376), then...

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling?

> Actually-- I thought about it a bit more. Perhaps the best way is to implement a custom LogitsProcessor for vLLM, which does this function calling by hijacking the logits...

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling?

@accupham The API design is really nice from my perspective. However, it seems that it relies on vLLM 0.7.0 for the chat API. We're working on integrating it in: #116...

[Question] Is vLLMRollout.generate_sequences the right place to implement tool calling?

@accupham Sorry for the late response. Too busy recently, I will investigate your proposal this weekend.

Why the `magatron_v4.patch` is needed?

Hi @hxdtest , the megatron_v4.patch is necessary for veRL for two main reasons: 1. In veRL, we didn't initialize Megatron-LM with `initialize_megatron`, which initializes the global args. We only build...

Why the `magatron_v4.patch` is needed?

@hxdtest , we haven't tested verl on the 405B model. I think we can try it by using a larger TP size in rollout or implementing pipeline parallelism in vLLM...

内存利用率随着save_freq增长是为什么呀

Maybe you encounter a memory leakage in the FSDP saving method.

我们为什么在tokenizer里仅需要特殊处理gemma-2-2b-it

This is a design option. You can set either use instruct model's eos_token or the base model's eos_token :) For Qwen model, I think using the base model's tokenizer is...

Actor model didn't update correctly when upgrade megatron to core-r0.6.0

Hi @Wodswos, thanks for your detailed feedback! Your understanding of `critic/kl` is correct and I think the actor model weight is not updated when upgraded to v0.6.0. I think this...