Guangming Sheng
Guangming Sheng
@AIRobotZhang In FSDP, ShardingStrategy.FULL_SHARD is ZeRO3
Hi @accupham , thanks for your questions! > So what would be the best way to add in dynamic function calling? Hook the [generate method of vLLM's LLM class](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L376), then...
> Actually-- I thought about it a bit more. Perhaps the best way is to implement a custom LogitsProcessor for vLLM, which does this function calling by hijacking the logits...
@accupham The API design is really nice from my perspective. However, it seems that it relies on vLLM 0.7.0 for the chat API. We're working on integrating it in: #116...
@accupham Sorry for the late response. Too busy recently, I will investigate your proposal this weekend.
Hi @hxdtest , the megatron_v4.patch is necessary for veRL for two main reasons: 1. In veRL, we didn't initialize Megatron-LM with `initialize_megatron`, which initializes the global args. We only build...
@hxdtest , we haven't tested verl on the 405B model. I think we can try it by using a larger TP size in rollout or implementing pipeline parallelism in vLLM...
Maybe you encounter a memory leakage in the FSDP saving method.
This is a design option. You can set either use instruct model's eos_token or the base model's eos_token :) For Qwen model, I think using the base model's tokenizer is...
Hi @Wodswos, thanks for your detailed feedback! Your understanding of `critic/kl` is correct and I think the actor model weight is not updated when upgraded to v0.6.0. I think this...