StableToolBench issues

ToolBench Key

2

Hi, excellent work! I have submitted the form, but didn't received the Key or any response at all. Hope to be as quickly as I can get the key, to...

Dandelionym

ToolBench Key

1

Your impressive work has greatly impressed me. I would like to express my sincere thanks for your contribution to our research community. I intend to reproduce your excellent work for...

Hanlin1004

DFS.py changes causing functions to not be called with ToolLLaMa

1

Hi! I'm working on running ToolLLaMa against the StableToolBench server, and noticed an issue. I am executing the following: ```bash python toolbench/inference/qa_pipeline.py \ --tool_root_dir data_example/toolenv/tools/ \ --backbone_model toolllama \ --model_path...

kingb12

Could you release the reproduction data for your result

2

I'm testing the pass rate evaluation, could you offer the reproduction data like Toolbench? Thanks for your reply

p1nksnow

报错信息：AttributeError: function_call`

请问下述报错是怎么回事？ `Traceback (most recent call last): File "/home/guan/shared/StableToolBench/toolbench/tooleval/eval_pass_rate.py", line 100, in example, File "/root/anaconda3/envs/StableToolBench/lib/python3.9/concurrent/futures/_base.py", line 433, in result return self.__get_result() File "/root/anaconda3/envs/StableToolBench/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result raise self._exception File "/root/anaconda3/envs/StableToolBench/lib/python3.9/concurrent/futures/thread.py",...

wupaopao123

inference problem

Thanks for your work! When I tried to reproduce the inference, I encountered an issue. Is this normal? `openai.BadRequestError: Error code: 400 - {'error': {'message': "None is not of type...

farawayxxx

gpt3.5 > gpt4 on pass rate?

1

great work for tool use!!! how ever, I had some question about the result. I would be grateful if you reply~~ ### 1. In `StableToolBench` I find pass rate result...

stanpcf

Implementation of DFSDT and ReACT

1

Thank you for your contribution. I noticed you mentioned that "We currently implement all models and algorithms supported by ToolBench." Could you clarify if this includes the DFSDT and ReACT...

JuhaoLiang1997

How is native LLM on this benchmark?

1

Hi, I'm wondering why this benchmark don't have native LLM's result(such as llama2, llama3). Do you plan to add these results on this work?

YenFuLin

requests.exceptions.ConnectionError: HTTPConnectionPool(host='8.218.239.54', port=8080

1

import requests import json url = 'http://0.0.0.0:8005/virtual' # data = { # "category": "Media", # "tool_name": "newapi_for_media", # "api_name": "url", # "tool_input": {'url': 'https://api.socialmedia.com/friend/photos'}, # "strip": "", # "toolbench_key": "XXXXXXX"...

lileishitou

StableToolBench
StableToolBench copied to clipboard

Metadata

ToolBench Key

ToolBench Key

DFS.py changes causing functions to not be called with ToolLLaMa

Could you release the reproduction data for your result

报错信息：AttributeError: function_call`

inference problem

gpt3.5 > gpt4 on pass rate?

Implementation of DFSDT and ReACT

How is native LLM on this benchmark?

requests.exceptions.ConnectionError: HTTPConnectionPool(host='8.218.239.54', port=8080

← Metadata

Owner

Metadata

StableToolBench StableToolBench copied to clipboard

Metadata

← Metadata

Owner

Metadata

StableToolBench
StableToolBench copied to clipboard