gorilla
gorilla copied to clipboard
[BFCL] Wrong Format in the Possible Answers of Live Parallel Multiple
Describe the issue There are several format issues in the possible answers of live test cases.
ID datapoint
- Datapoint / Model Handler permalink: https://github.com/ShishirPatil/gorilla/blob/main/berkeley-function-call-leaderboard/data/possible_answer/BFCL_v2_live_parallel_multiple.json Here are failed examples from my model (IMHO, most of them are valid.) sample.txt
What is the issue
- Some live test cases are incorrectly treating the string type as an array of strings, which is causing the correct answers to fail.
- There are some inconsistencies in the function names in the live tests. For instance, the function x**2 is being replaced with x^2, and the original lambda function names are being rejected.
- Some test cases are translating certain function parameters. For example, if a user inputs a location in Chinese, the expected answers only accept the translated (non-Chinese) version, causing mismatches.
Proposed Changes
Correct the possible answers.