Hao Ding
Hao Ding
In the [paper](https://arxiv.org/pdf/2206.08686.pdf) you provide, it is stated that "Each agent i follows a shared policy". However, in the codebase, I only found implementations that resemble [MAPPO](https://github.com/marlbenchmark/on-policy)'s "SeperatedBuffer" and "SeperatedRunner",...
FIX #6015 FIX #5827 FIX #5611 FIX #5600 This should work for [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) and [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) You can serve OpenAI compatible API with: ```shell python -m vllm.entrypoints.openai.api_server \ --served-model-name gte-Qwen2-7B-instruct \...
The interface `num_return_sequences` is reserved for algorithms that require multiple samplings for a single prompt, such as [GRPO](https://github.com/deepseek-ai/DeepSeek-Math), which I'm currently developing. It should have no effect on current algorithms...