Huanzhi Mao issues

Results 16 issues of


                                            Huanzhi Mao

[Non-Urgent] Remove Emoji from Leaderboard

It's not an urgent request, but can we remove those emoji from the leaderboard? It's distracting and I cannot really tell what that column is about by just looking at...

Leaderboard Update, in sync with BFCL May 14th Release (GPT-4o and Gemini)

As mentioned in #426, this PR addes 4 new models to the leaderboard. The model costs are also updated accordingly. This PR **DOES** change the leaderboard ranking. This PR **DOES...

A windows installer for CLI

The installer is the `Gorilla-CLI.exe` in the `dist_exe` folder. The scripts that are used to generate the installer are also attached in the `dist_exe/scripts` folder.

[BFCL] Evaluation with Correct Precision Settings for Locally-Hosted Models

The following models are intended to be evaluated using `bfloat16` precision instead of `float16` according to their model card on HuggingFace. We should change the default precision setting for their...

BFCL

[BFCL] Support Parallel Inference for Hosted Models

This PR introduces multi-threading to parallel the API call to the hosted model endpoints and significantly speeds up the model response generation process. User can specify the number of threads...

[BFCL] Leaderboard Update, in sync with #557, #568, #569, #570, and #573 (Dataset Fix & New Model)

This PR updates the leaderboard to reflect the changes in score due to the following PR merge: - #557 - #568 and the addition of the following models: - #569...

Consistent Metric for OSS Model Cost

We need to be consistent in our metrics to determine the cost for OSS models. If a model is hosted locally and has `OSS_LATENCY`, then it should not belong to...

BFCL-General

[Leaderboard] Overhaul Leaderboard Table

The current BFCL leaderboard table is built using basic HTML, which has made it increasingly difficult to add new functionalities. To address this, the leaderboard table is overhauled to use...

Single Source of Truth

The mapping from test category name to test file path is repeated three times, which is bad. - `test_files` in `eval_data_compilation.py` - `test_categories` in `openfunctions_evaluation.py` - `TEST_CATEGORIES` in `model_handler/constant.py`

good first issue

BFCL-General

Add Sticky Effect for the First Three Columns on Leaderboard

This PR adds a sticky effect for the first three columns (rank, overall accuracy and model name) on the leaderboard. This feature is handy when viewing the leaderboard on small...