Hao Ding issues

Results 3 issues of


                                            Hao Ding

Parameter sharing of MAPPO

In the [paper](https://arxiv.org/pdf/2206.08686.pdf) you provide, it is stated that "Each agent i follows a shared policy". However, in the codebase, I only found implementations that resemble [MAPPO](https://github.com/marlbenchmark/on-policy)'s "SeperatedBuffer" and "SeperatedRunner",...

[Model] Add support for 'gte-Qwen2' embedding models

FIX #6015 FIX #5827 FIX #5611 FIX #5600 This should work for [Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct) and [Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct) You can serve OpenAI compatible API with: ```shell python -m vllm.entrypoints.openai.api_server \ --served-model-name gte-Qwen2-7B-instruct \...

add 'num_return_sequences' feature in actor

The interface `num_return_sequences` is reserved for algorithms that require multiple samplings for a single prompt, such as [GRPO](https://github.com/deepseek-ai/DeepSeek-Math), which I'm currently developing. It should have no effect on current algorithms...