verl
verl copied to clipboard
Add support for Tool Calling in vLLM
According to vLLM docs, you can specify tools and custom tool parsers (example below).
Why is this useful:
- Tool calling is useful in general because it augments the model with additional data
- We can train models to run function calling dynamically as the model is generating
- This needs a custom tool parser, e.g. you could teach the model to call an API to retrieve additional data (see the
ExampleToolParserbelow)
- This needs a custom tool parser, e.g. you could teach the model to call an API to retrieve additional data (see the
Problems with implementing this in veRL:
- veRL uses
inference_engine.generate()which does not support tool calling (only supported inchat())- This potentially needs support in vLLM to make it happen.
- The main challenge is that
generatecurrently processes raw text prompts without interpreting structured data (e.g., function calls), whereaschatconverts structured messages into prompts and integrates tools.
@ToolParserManager.register_module(["example"])
class ExampleToolParser(ToolParser):
def __init__(self, tokenizer: AnyTokenizer):
super().__init__(tokenizer)
# adjust request. e.g.: set skip special tokens
# to False for tool call output.
def adjust_request(
self, request: ChatCompletionRequest) -> ChatCompletionRequest:
return request
# implement the tool call parse for stream call
def extract_tool_calls_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
request: ChatCompletionRequest,
) -> Union[DeltaMessage, None]:
return delta
# implement the tool parse for non-stream call
def extract_tool_calls(
self,
model_output: str,
request: ChatCompletionRequest,
) -> ExtractedToolCallInformation:
return ExtractedToolCallInformation(tools_called=False,
tool_calls=[],
content=text)
An alternative way to implement this kind of feature is as seen in fork below. This one implements it into a LLMGenerationManager that can add context on the fly as the rollout is being generated. This comes with some downsides like adding a lot of extra code, but it allows to train the model to dynamically inject additional context, training the model to figure out when to retrieve.
https://github.com/PeterGriffinJin/Search-R1