verl Add support for Tool Calling in vLLM

Add support for Tool Calling in vLLM

Open casper-hansen opened this issue 9 months ago • 1 comments

According to vLLM docs, you can specify tools and custom tool parsers (example below).

Why is this useful:

Tool calling is useful in general because it augments the model with additional data
We can train models to run function calling dynamically as the model is generating
- This needs a custom tool parser, e.g. you could teach the model to call an API to retrieve additional data (see the ExampleToolParser below)

Problems with implementing this in veRL:

veRL uses inference_engine.generate() which does not support tool calling (only supported in chat())
- This potentially needs support in vLLM to make it happen.
- The main challenge is that generate currently processes raw text prompts without interpreting structured data (e.g., function calls), whereas chat converts structured messages into prompts and integrates tools.

@ToolParserManager.register_module(["example"])
class ExampleToolParser(ToolParser):
    def __init__(self, tokenizer: AnyTokenizer):
        super().__init__(tokenizer)

    # adjust request. e.g.: set skip special tokens
    # to False for tool call output.
    def adjust_request(
            self, request: ChatCompletionRequest) -> ChatCompletionRequest:
        return request

    # implement the tool call parse for stream call
    def extract_tool_calls_streaming(
        self,
        previous_text: str,
        current_text: str,
        delta_text: str,
        previous_token_ids: Sequence[int],
        current_token_ids: Sequence[int],
        delta_token_ids: Sequence[int],
        request: ChatCompletionRequest,
    ) -> Union[DeltaMessage, None]:
        return delta

    # implement the tool parse for non-stream call
    def extract_tool_calls(
        self,
        model_output: str,
        request: ChatCompletionRequest,
    ) -> ExtractedToolCallInformation:
        return ExtractedToolCallInformation(tools_called=False,
                                            tool_calls=[],
                                            content=text)

Feb 22 '25 11:02 casper-hansen

An alternative way to implement this kind of feature is as seen in fork below. This one implements it into a LLMGenerationManager that can add context on the fly as the rollout is being generated. This comes with some downsides like adding a lot of extra code, but it allows to train the model to dynamically inject additional context, training the model to figure out when to retrieve.

https://github.com/PeterGriffinJin/Search-R1

Mar 01 '25 11:03 casper-hansen

verl verl copied to clipboard

Add support for Tool Calling in vLLM

verl
verl copied to clipboard