sharonyu-115
sharonyu-115
The problem is related to the interface change of vllm. Updating the `collective_rpc` function in `vllm_async_server.py` solves the problem: ``` def collective_rpc( self, method: str | Callable, timeout: Optional[float] =...
> The changes generally look good, there are some minor issues. We should add some test for this feature, I suggest adding tests for fp8 kv cache in the following...
Update here the latest experimental results. **Configuration** Model: Qwen3-8B-Base. Method: Dynamically calculate qkv scales at the end of each training step and synchronize them to vLLM. Framework: NeMo-RL, vLLM +...