gemini-cli icon indicating copy to clipboard operation
gemini-cli copied to clipboard

gemini failure to export lmarena leaderboard

Open DanCard opened this issue 1 week ago • 3 comments

What happened?

claude was able to accomplish the task

console-gemini-failure-to-export-leaderboard.txt

What did you expect to happen?

export lmarena leaderboard

Client information

  • CLI Version: 0.20.2
  • Git Commit: af894e468
  • Session ID: e2541d47-0971-4aa8-82e1-019df213ad03
  • Operating System: linux v25.2.1
  • Sandbox Environment: no sandbox
  • Model Version: auto
  • Memory Usage: 488.8 MB

Login information

google account

Anything else we need to know?

I'm happy to share what ever details you need.

DanCard avatar Dec 17 '25 20:12 DanCard

Hi @DanCard , Thank you for reporting this! To help us investigate this issue, could you please provide the following details?

  • Reproduction Steps: Could you list the exact commands or prompts you used to trigger the export? Knowing the specific workflow helps us replicate the failure in our environment.

  • Log Contents: To ensure the data is easily searchable and accessible for our engineers, could you please copy the contents of the .txt file you attached and paste them directly into a comment here?

  • Observed Error: Besides the failure to export, did the CLI return a specific error message or code (e.g., a timeout, a parsing error, or a 403)?

We really appreciate your help in making the tool better. Once we have those logs and steps, we can dive deeper into the session logs!

galz10 avatar Dec 17 '25 21:12 galz10

I changed model settings to enable preview models. Was still horrible. See attached.

console-gemini-failure-to-export-leaderboard-preview.txt

DanCard avatar Dec 17 '25 21:12 DanCard

Reproduction Steps: Could you list the exact commands or prompts you used to trigger the export? Knowing the specific workflow helps us replicate the failure in our environment.

You can see the prompts in the log. Summary: Export table at: https://lmarena.ai/leaderboard/text/overall-no-style-control filter for only open weight models Add parameter count Only keep models with less than 100b parameters. Export to csv file or google sheet.

Observed Error: Besides the failure to export, did the CLI return a specific error message or code (e.g., a timeout, a parsing error, or a 403)?

none.

Paste of second log below:

███ █████████ ██████████ ██████ ██████ █████ ██████ █████ █████ ░░░███ ███░░░░░███░░███░░░░░█░░██████ ██████ ░░███ ░░██████ ░░███ ░░███ ░░░███ ███ ░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███ ░░░███ ░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███ ███░ ░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███ ███░ ░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███ ███░ ░░█████████ ██████████ █████ █████ █████ █████ ░░█████ █████ ░░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░

Tips for getting started:

  1. Ask questions, edit files, or run commands.
  2. Be specific for the best results.
  3. /help for more information. 12:53:03 Wed Dec 17 dcar@bosgame ~/projects/gemini $ gemini

███ █████████ ██████████ ██████ ██████ █████ ██████ █████ █████ ░░░███ ███░░░░░███░░███░░░░░█░░██████ ██████ ░░███ ░░██████ ░░███ ░░███ ░░░███ ███ ░░░ ░███ █ ░ ░███░█████░███ ░███ ░███░███ ░███ ░███ ░░░███ ░███ ░██████ ░███░░███ ░███ ░███ ░███░░███░███ ░███ ███░ ░███ █████ ░███░░█ ░███ ░░░ ░███ ░███ ░███ ░░██████ ░███ ███░ ░░███ ░░███ ░███ ░ █ ░███ ░███ ░███ ░███ ░░█████ ░███ ███░ ░░█████████ ██████████ █████ █████ █████ █████ ░░█████ █████ ░░░ ░░░░░░░░░ ░░░░░░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░ ░░░░░

Tips for getting started:

  1. Ask questions, edit files, or run commands.
  2. Be specific for the best results.
  3. /help for more information.

Export table at: https://lmarena.ai/leaderboard/text/overall-no-style-control filter for only open weight models Add parameter count Only keep models with less than 100b parameters. Export to csv file or google sheet.

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ ReadFile leaderboard.html │ │ │ │ Read all 3 lines from leaderboard.html (some lines were shortened) │ │ │ │ ✓ ReadFile process_leaderboard.py │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell grep "License" -A 20 leaderboard.html | head -n 20 [current working directory /home/dcar/projects/gemini] │ │ │ │ ...td>

1057
(1049, 1066)93292<td │ │ class=" small">OpenAIProprietary<a │ │ href="https://openai.com/blog/new-models-and-developer-products-announced-at-devday">Homepage2023-04<td │ │ class=" small">23llama-3-8b-instruct-turbo <td │ │ class="text-center small">1251
(1230, 1271)1055
<span │ │ style="font-size: 0.7em;">(1034, 1075)6560<a │ │ href="https://fireworks.ai/">FireworksOpen-weight<a │ │ href="https://fireworks.ai/blog/fireworks-llama-3-instruct-turbo">Homepage24<td │ │ class=" small">gemma-1.1-7b-it 1243
<span │ │ style="font-size: 0.7em;">(1230, 1256)1047
(1034, 1060)<td │ │ class="text-center small">30097GoogleOpen-weight<td │ │ class=" small">Homepage25qwen1.5-14b-chat 1239
(1222, 1256)1043
(1026, │ │ 1060)13284AlibabaOpen-weightHomepage26<a href="/model/mistral-7b-instruct-v0.2" │ │ class="model_name">mistral-7b-instruct-v0.2 1234
(1220, │ │ 1248)1038
(1024, 1052)24945Mistral AIOpen-weight<a │ │ href="https://mistral.ai/news/announcing-mistral-7b/">Homepage27starling-lm-7b-beta 1229
<span │ │ style="font-size: 0.7em;">(1210, 1247)1033
(1014, 1051)<td │ │ class="text-center small">9809NexusflowOpen-weightHomepage28gemma-7b-it 1227
<span │ │ style="font-size: 0.7em;">(1215, 1239)1031
(1019, 1043)<td │ │ class="text-center small">38260GoogleOpen-weight<td │ │ class=" small">Homepage29<td │ │ class=" small">Llama-2-70B-chat-hf 1219
<span │ │ style="font-size: 0.7em;">(1209, 1229)1023
(1013, 1033)<td │ │ class="text-center small">62846MetaOpen-weightHomepage30yi-34b-chat 1216
<span │ │ style="font-size: 0.7em;">(1202, 1230)1020
(1006, 1034)<td │ │ class="text-center small">2347201.AIOpen-weightHomepage31Llama-2-13B-chat-hf 1198
<span │ │ style="font-size: 0.7em;">(1185, 1210)1002
(989, 1014)<td │ │ class="text-center small">43703MetaOpen-weightHomepage32claude-instant-1 1192
(1178, 1206)996
(982, │ │ 1010)26941AnthropicProprietaryHomepage33<a href="/model/qwen3-next-80b-a3b-instruct" │ │ class="model_name">qwen3-next-80b-a3b-instruct 1190
(1148, │ │ 1232)994
(952, 1036)1039AlibabaOpen-weight<a │ │ href="https://huggingface.co/Qwen/Qwen3-next-80B-A3B-Instruct">Homepage34<td │ │ class=" small">vicuna-33b-v1.3 1188
<span │ │ style="font-size: 0.7em;">(1170, 1205)992
(974, 1009)<td │ │ class="text-center small">12030LMSYSOpen-weightHomepage35<td │ │ class=" small">Llama-2-7B-chat-hf 1173
<span │ │ style="font-size: 0.7em;">(1159, 1188)977
(963, 992)<td │ │ class="text-center small">32007MetaOpen-weightHomepage36<td │ │ class=" small">chatglm3-6b 1146
(1129, 1164)950
(933, 968)9784Zhipu.AIOpen-weight<a │ │ href="https://huggingface.co/THUDM/chatglm3-6b">Homepage37<a │ │ href="/model/vicuna-13b-v1.5-16k" class="model_name">vicuna-13b-v1.5-16k 1140
(1118, 1162)944
(922, 966)5783LMSYSOpen-weight<a │ │ href="https://lmsys.org/blog/2023-06-22-v1.5/">Homepage38<a │ │ href="/model/vicuna-7b-v1.5" class="model_name">vicuna-7b-v1.5 1128
(1106, │ │ 1150)932
(910, 954)5693LMSYSOpen-weight<a │ │ href="https://lmsys.org/blog/2023-06-22-v1.5/">Homepage39<a │ │ href="/model/claude-2.1" class="model_name">claude-2.1 1125
(1108, │ │ 1142)929
(912, 946)10185AnthropicProprietary<a │ │ href="https://www.anthropic.com/index/claude-2-1">Homepage40<a │ │ href="/model/claude-2.0" class="model_name">claude-2.0 1106
(1088, │ │ 1124)910
(892, 928)9936AnthropicProprietary<a │ │ href="https://www.anthropic.com/index/claude-2">Homepage41<a │ │ href="/model/bard" class="model_name">bard 1091
(1077, 1104)<td │ │ class="text-center small">895
(881, 908)32938<a │ │ href="https://deepmind.google/">GoogleProprietary<a │ │ href="https://bard.google.com/">Homepage42<a │ │ href="/model/palm-2-chatbot" class="model_name">palm-2-chatbot 1088
(1074, │ │ 1102)892
(878, 906)23681GoogleProprietary<a │ │ href="https://ai.google/discover/palm2">Homepage43<a │ │ href="/model/gpt-3.5-turbo-0314" class="model_name">gpt-3.5-turbo-0314 1086
(1073, 1100)890
(877, 904)28859OpenAIProprietary<a │ │ href="https://openai.com/blog/introducing-chatgpt-and-whisper-apis">Homepage44<td │ │ class=" small">llama-13b 1072
(1051, 1093)876
(855, 897)5459MetaOpen-weight<a │ │ href="https://ai.facebook.com/blog/large-language-model-llama-meta-ai/">Homepage45alpaca-13b 1039
<span │ │ style="font-size: 0.7em;">(1013, 1064)843
(817, 868)<td │ │ class="text-center small">4839StanfordOpen-weight<td │ │ class=" small">Homepage46fastchat-t5-3b-v1.0 1003
(977, 1030)807
(781, │ │ 834)4384LMSYSOpen-weightHomepage<td │ │ class=" small">47chatglm-6b 985
(961, 1010)789
(765, │ │ 814)4980Zhipu.AIOpen-weightHomepage<td │ │ class=" small">48vicuna-13b-v1.1 980
(961, 999)784
(765, │ │ 803)8470LMSYSOpen-weightHomepage<td │ │ class=" small">49claude-1 972
<span │ │ style="font-size: 0.7em;">(954, 990)776
(758, 794)<td │ │ class="text-center small">8556AnthropicProprietary<td │ │ class=" small">Homepage50mpt-7b-chat 966
<span │ │ style="font-size: 0.7em;">(943, 989)770
(747, 793)<td │ │ class="text-center small">5400MosaicMLOpen-weight<td │ │ class=" small">Homepage51<td │ │ class=" small">guanaco-33b-merged 949
<span │ │ style="font-size: 0.7em;">(919, 979)753
(723, 783)<td │ │ class="text-center small">2911N/AOpen-weight<a │ │ href="https://huggingface.co/timdettmers/guanaco-33b-merged">Homepage52dolly-v2-12b 904
(875, 934)708
(679, 738)3200DatabricksOpen-weight<a │ │ href="https://www.databricks.com/blog/2023/04/12/dolly-v2-databricks-releases-new-open-model.html">Homepage53<a href="/model/stablelm-tuned-alpha-7b" │ │ class="model_name">stablelm-tuned-alpha-7b 892
(859, 925)<td │ │ class="text-center small">696
(663, 729)2709<a │ │ href="https://stability.ai/">Stability AIOpen-weight<a │ │ href="https://github.com/stability-AI/stableLM">Homepage54<a │ │ href="/model/oasst-pythia-12b" class="model_name">oasst-pythia-12b 892
(860, 924)696
(664, 728)2720OpenAssistantOpen-weightHomepage55rwkv-4-raven-14b 875
(835, 915)679
(639, │ │ 719)1982RWKVOpen-weightHomepage<td │ │ class=" small">56koala-13b 850
<span │ │ style="font-size: 0.7em;">(822, 878)654
(626, 682)<td │ │ class="text-center small">3009BAIROpen-weight<td │ │ class=" small">Homepage │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell grep "Open-weight" leaderboard.html | grep -o 'class="model_name">[^<]*' | cut -d'>' -f2 [current working directory /home/dcar/projects/gem… │ │ │ │ gemini-3-pro │ │ gpt-4o-2024-05-13 │ │ gpt-4-turbo-2024-04-09 │ │ claude-3-opus-20240229 │ │ gemini-2.5-flash-image │ │ claude-3-sonnet-20240229 │ │ dbrx-instruct │ │ command-r-plus │ │ Llama-3-70B-Instruct │ │ qwen1.5-72b-chat │ │ mixtral-8x22b-instruct-v0.1 │ │ gemini-pro-dev-api │ │ claude-3-haiku-20240307 │ │ mixtral-8x7b-instruct-v0.1 │ │ wizardlm-2-8x22b │ │ gemma-2-27b-it │ │ Llama-3-8B-Instruct │ │ qwen1.5-32b-chat │ │ gemma-2-9b-it │ │ mistral-large-2402 │ │ gpt-4-0125-preview │ │ gpt-4-1106-preview │ │ llama-3-8b-instruct-turbo │ │ gemma-1.1-7b-it │ │ qwen1.5-14b-chat │ │ mistral-7b-instruct-v0.2 │ │ starling-lm-7b-beta │ │ gemma-7b-it │ │ Llama-2-70B-chat-hf │ │ yi-34b-chat │ │ Llama-2-13B-chat-hf │ │ claude-instant-1 │ │ qwen3-next-80b-a3b-instruct │ │ vicuna-33b-v1.3 │ │ Llama-2-7B-chat-hf │ │ chatglm3-6b │ │ vicuna-13b-v1.5-16k │ │ vicuna-7b-v1.5 │ │ claude-2.1 │ │ claude-2.0 │ │ bard │ │ palm-2-chatbot │ │ gpt-3.5-turbo-0314 │ │ llama-13b │ │ alpaca-13b │ │ fastchat-t5-3b-v1.0 │ │ chatglm-6b │ │ vicuna-13b-v1.1 │ │ claude-1 │ │ mpt-7b-chat │ │ guanaco-33b-merged │ │ dolly-v2-12b │ │ stablelm-tuned-alpha-7b │ │ oasst-pythia-12b │ │ rwkv-4-raven-14b │ │ koala-13b │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ WriteFile Writing to process_leaderboard_final.py │ │ │ │ 1 import pandas as pd │ │ 2 - from bs4 import BeautifulSoup │ │ 2 + import re │ │ 3 + import sys │ │ 4 │ │ 5 try: │ │ 6 with open('leaderboard.html', 'r') as f: │ │ 7 html_content = f.read() │ │ 8 │ │ 8 - soup = BeautifulSoup(html_content, 'html.parser') │ │ 9 - table = soup.find('table') │ │ 9 + # Parse HTML │ │ 10 + # flavor='lxml' is generally faster and more robust │ │ 11 + tables = pd.read_html(html_content, flavor='lxml') │ │ 12 + if not tables: │ │ 13 + print("No tables found in HTML.") │ │ 14 + sys.exit(1) │ │ 15 │ │ 11 - headers = [header.text.strip() for header in table.find_all('th')] │ │ 16 + df = tables[0] │ │ 17 │ │ 13 - # Clean up header names │ │ 14 - headers = ['Rank', 'Model', 'Arena Elo rating', 'LMSYS Elo rating', 'Votes', 'Organization', 'License', 'Homepage', 'Knowledge Cutoff'] │ │ 18 + # Clean column names (remove newlines usually caused by
) │ │ 19 + df.columns = [c.replace('\r', ' ').replace('\n', ' ').strip() for c in df.columns] │ │ 20 │ │ 16 - rows = [] │ │ 17 - for row in table.find('tbody').find_all('tr'): │ │ 18 - cells = row.find_all('td') │ │ 19 - row_data = [] │ │ 20 - for i, cell in enumerate(cells): │ │ 21 - # For the Elo columns, get the text before the
tag │ │ 22 - if i == 2 or i == 3: │ │ 23 - # Get the content of the cell and split it by the
tag │ │ 24 - elo_parts = str(cell).split('
') │ │ 25 - if len(elo_parts) > 0: │ │ 26 - # Take the first part and clean it from any html tags │ │ 27 - elo_rating = BeautifulSoup(elo_parts[0], 'html.parser').text.strip() │ │ 28 - row_data.append(elo_rating) │ │ 29 - else: │ │ 30 - row_data.append(cell.text.strip()) │ │ 21 + # Identify License column │ │ 22 + # Based on inspection, it is 'License' │ │ 23 + if 'License' not in df.columns: │ │ 24 + # Try to find a column that looks like it │ │ 25 + for col in df.columns: │ │ 26 + if 'License' in col: │ │ 27 + df.rename(columns={col: 'License'}, inplace=True) │ │ 28 + break │ │ 29 │ │ 32 - else: │ │ 33 - row_data.append(cell.text.strip()) │ │ 34 - rows.append(row_data) │ │ 30 + if 'License' not in df.columns: │ │ 31 + print("Could not find License column. Columns are:", df.columns) │ │ 32 + sys.exit(1) │ │ 33 │ │ 36 - df = pd.DataFrame(rows, columns=headers) │ │ 34 + # Filter for Open-weight │ │ 35 + # Normalize to handle potential whitespace │ │ 36 + df['License'] = df['License'].astype(str).str.strip() │ │ 37 + df_open = df[df['License'].str.contains('Open-weight', case=False, na=False)].copy() │ │ 38 │ │ 38 - # Add Parameter count column │ │ 39 - df['Parameter count'] = df['Model'].str.extract(r'(\d+[B|b])')[0] │ │ 39 + print(f"Found {len(df_open)} Open-weight models.") │ │ 40 │ │ 41 + # Function to extract parameters │ │ 42 + def extract_params(name): │ │ 43 + name_lower = str(name).lower() │ │ 44 │ │ 42 - df.to_csv('leaderboard.csv', index=False) │ │ 45 + # Known models without explicit size in name │ │ 46 + if 'dbrx-instruct' in name_lower: return 132.0 │ │ 47 + if 'command-r-plus' in name_lower: return 104.0 │ │ 48 + if 'command-r' in name_lower: return 35.0 │ │ 49 + # Falcon 180B usually has 180b in name │ │ 50 + # Grok-1 is 314B │ │ 51 + if 'grok-1' in name_lower: return 314.0 │ │ 52 │ │ 44 - except FileNotFoundError: │ │ 45 - with open('leaderboard.csv', 'w') as f: │ │ 46 - f.write("leaderboard.html not found.") │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 53 + # Regex for AxB (e.g., 8x22b, 8x7b) │ │ 54 + # Note: 8x22b = 176B, 8x7b = 56B │ │ 55 + match_axb = re.search(r'(\d+)x(\d+)[b]', name_lower) │ │ 56 + if match_axb: │ │ 57 + return float(match_axb.group(1)) * float(match_axb.group(2)) │ │ 58 + │ │ 59 + # Regex for standard XB (e.g., 70b, 7b, 1.5b) │ │ 60 + # We look for a number followed by 'b' │ │ 61 + # We also want to avoid matching things like "v1.5" as 1.5b unless it explicitly says b │ │ 62 + match_b = re.search(r'(\d+(?:.\d+)?)[b]', name_lower) │ │ 63 + if match_b: │ │ 64 + return float(match_b.group(1)) │ │ 65 + │ │ 66 + return None │ │ 67 + │ │ 68 + df_open['Parameter Count (B)'] = df_open['Model'].apply(extract_params) │ │ 69 + │ │ 70 + # Log models with unknown parameters for manual verification if needed │ │ 71 + unknown_params = df_open[df_open['Parameter Count (B)'].isna()] │ │ 72 + if not unknown_params.empty: │ │ 73 + print(f"Warning: Could not determine parameters for {len(unknown_params)} models:") │ │ 74 + for name in unknown_params['Model'].tolist(): │ │ 75 + print(f" - {name}") │ │ 76 + │ │ 77 + # Filter < 100B │ │ 78 + # We drop rows where parameter count is NaN because we can't verify the condition │ │ 79 + # User said: "Only keep models with less than 100b parameters." │ │ 80 + df_filtered = df_open[df_open['Parameter Count (B)'] < 100].copy() │ │ 81 + │ │ 82 + print(f"Filtered down to {len(df_filtered)} models with < 100B parameters.") │ │ 83 + │ │ 84 + # Sort by Rank if available, else by Elo │ │ 85 + if 'Rank' in df_filtered.columns: │ │ 86 + df_filtered.sort_values('Rank', inplace=True) │ │ 87 + │ │ 88 + # Export │ │ 89 + output_file = 'leaderboard_filtered.csv' │ │ 90 + df_filtered.to_csv(output_file, index=False) │ │ 91 + print(f"Exported to {output_file}") │ │ 92 + │ │ 93 except Exception as e: │ │ 48 - with open('leaderboard.csv', 'w') as f: │ │ 49 - f.write(f"An error occurred: {e}") │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 94 + print(f"An error occurred: {e}") │ │ 95 + sys.exit(1) │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell python3 process_leaderboard_final.py [current working directory /home/dcar/projects/gemini] │ │ │ │ Traceback (most recent call last): │ │ File "/home/dcar/projects/gemini/process_leaderboard_final.py", line 1, in │ │ import pandas as pd │ │ ModuleNotFoundError: No module named 'pandas' │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell ./venv/bin/python process_leaderboard_final.py [current working directory /home/dcar/projects/gemini] │ │ │ │ /home/dcar/projects/gemini/process_leaderboard_final.py:11: FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a │ │ future version. To read from a literal string, wrap it in a 'StringIO' object. │ │ tables = pd.read_html(html_content, flavor='lxml') │ │ Found 38 Open-weight models. │ │ Filtered down to 34 models with < 100B parameters. │ │ Exported to leaderboard_filtered.csv │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ ReadFile leaderboard_filtered.csv │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ✦ I have successfully processed the leaderboard data.

Actions Taken:

  1. Analyzed the leaderboard.html file found in your project directory.
  2. Parsed the HTML table and filtered for models with an "Open-weight" license.
  3. Extracted parameter counts from model names (e.g., "70b" -> 70, "8x7b" -> 56) and handled known models without explicit sizes in their names (e.g., Command R+, DBRX).
  4. Filtered the list to keep only models with less than 100 billion parameters.
  5. Exported the final result to leaderboard_filtered.csv.

Result: The file leaderboard_filtered.csv has been created with 34 models matching your criteria.

You can view the results with: cat leaderboard_filtered.csv

Can you export to google sheet?

✦ I cannot directly create or modify a Google Sheet in your account without authentication credentials.

However, I have created a Python script that can do this for you if you have a Google Cloud Service Account.

Option 1: Manual Import (Recommended/Fastest)

  1. Go to Google Sheets > File > Import > Upload.
  2. Select the leaderboard_filtered.csv file I generated in the previous step.

Option 2: Automated Script If you want to automate this, I've created a script below.

Prerequisites:

  1. Credentials: You need a service_account.json key file from the Google Cloud Console.
  2. Enable API: Enable the "Google Sheets API" and "Google Drive API" for your project.
  3. Share: Share your target Google Sheet (or a folder) with the client_email found in your service_account.json.

I will create the script export_to_gsheet.py for you.

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ WriteFile Writing to export_to_gsheet.py │ │ │ │ 1 import pandas as pd │ │ 2 import sys │ │ 3 import os │ │ 4 │ │ 5 try: │ │ 6 import gspread │ │ 7 from google.oauth2.service_account import Credentials │ │ 8 except ImportError: │ │ 9 print("Missing required libraries. Please run:") │ │ 10 print("pip install gspread google-auth") │ │ 11 sys.exit(1) │ │ 12 │ │ 13 # CONFIGURATION │ │ 14 # Path to your Service Account JSON key file │ │ 15 SERVICE_ACCOUNT_FILE = 'service_account.json' │ │ 16 # The name of the Google Sheet to write to (must already exist and be shared with the service account) │ │ 17 # OR the script can create a new one if you have Drive API access. │ │ 18 # For simplicity, let's assume we are creating a new one or opening by name. │ │ 19 SHEET_NAME = 'Open Weight Leaderboard' │ │ 20 CSV_FILE = 'leaderboard_filtered.csv' │ │ 21 │ │ 22 SCOPES = [ │ │ 23 'https://www.googleapis.com/auth/spreadsheets', │ │ 24 'https://www.googleapis.com/auth/drive' │ │ 25 ] │ │ 26 │ │ 27 def export_to_gsheet(): │ │ 28 if not os.path.exists(SERVICE_ACCOUNT_FILE): │ │ 29 print(f"Error: {SERVICE_ACCOUNT_FILE} not found.") │ │ 30 print("Please place your Google Cloud Service Account JSON key in this directory.") │ │ 31 return │ │ 32 │ │ 33 if not os.path.exists(CSV_FILE): │ │ 34 print(f"Error: {CSV_FILE} not found. Please run the processing script first.") │ │ 35 return │ │ 36 │ │ 37 print("Authenticating with Google...") │ │ 38 creds = Credentials.from_service_account_file(SERVICE_ACCOUNT_FILE, scopes=SCOPES) │ │ 39 client = gspread.authorize(creds) │ │ 40 │ │ 41 print(f"Reading {CSV_FILE}...") │ │ 42 df = pd.read_csv(CSV_FILE) │ │ 43 │ │ 44 # Replace NaN with empty string for clean sheet │ │ 45 df = df.fillna('') │ │ 46 │ │ 47 print(f"Opening/Creating sheet '{SHEET_NAME}'...") │ │ 48 try: │ │ 49 sheet = client.open(SHEET_NAME) │ │ 50 print(f"Found existing sheet: {sheet.url}") │ │ 51 except gspread.SpreadsheetNotFound: │ │ 52 print("Sheet not found, creating a new one...") │ │ 53 sheet = client.create(SHEET_NAME) ││ 54 print(f"Created new sheet: {sheet.url}") │ │ 55 print(f"IMPORTANT: Share this sheet with your personal email to view it.") │ │ 56 print(f"Service Account Email: {creds.service_account_email}") │ │ 57 │ │ 58 worksheet = sheet.get_worksheet(0) │ │ 59 │ │ 60 print("Writing data...") │ │ 61 # Update header │ │ 62 worksheet.update([df.columns.values.tolist()] + df.values.tolist()) │ │ 63 │ │ 64 print("Done!") │ │ 65 │ │ 66 if name == "main": │ │ 67 export_to_gsheet() │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ✦ To use the script:

  1. Install libraries: pip install gspread google-auth
  2. Add Credentials: Place your service_account.json in the current directory.
  3. Run: python export_to_gsheet.py

csv file looks like garbage. Why isn't qwen3-next on the list?

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ ReadFile leaderboard_filtered.csv │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ WriteFile Writing to process_leaderboard_final.py │ │ │ │ 7 html_content = f.read() │ │ 8 │ │ 9 # Parse HTML │ │ 10 - # flavor='lxml' is generally faster and more robust │ │ 10 tables = pd.read_html(html_content, flavor='lxml') │ │ 11 if not tables: │ │ 13 - print("No tables found in HTML.") │ │ 12 + print("No tables found.") │ │ 13 sys.exit(1) │ │ 14 │ │ 15 df = tables[0] │ │ 16 │ │ 18 - # Clean column names (remove newlines usually caused by
) │ │ 19 - df.columns = [c.replace('\r', ' ').replace('\n', ' ').strip() for c in df.columns] │ │ 17 + # Clean headers │ │ 18 + df.columns = [ │ │ 19 + c.replace('\r', ' ').replace('\n', ' ').strip() │ │ 20 + for c in df.columns │ │ 21 + ] │ │ 22 │ │ 21 - # Identify License column │ │ 22 - # Based on inspection, it is 'License' │ │ 23 - if 'License' not in df.columns: │ │ 24 - # Try to find a column that looks like it │ │ 23 + # Identify key columns │ │ 24 + # We want: Rank, Model, Arena Elo rating, LMSYS Elo rating, Votes, Organization, License, Parameter Count │ │ 25 + │ │ 26 + # Rename complex headers for clarity │ │ 27 + cols_map = {} │ │ 28 for col in df.columns: │ │ 26 - if 'License' in col: │ │ 27 - df.rename(columns={col: 'License'}, inplace=True) │ │ 28 - break │ │ 29 + if 'Arena Elo' in col: │ │ 30 + cols_map[col] = 'Arena Elo' │ │ 31 + elif 'LMSYS Elo' in col: │ │ 32 + cols_map[col] = 'LMSYS Elo' │ │ 33 + elif 'Knowledge' in col: │ │ 34 + cols_map[col] = 'Knowledge Cutoff' │ │ 35 │ │ 30 - if 'License' not in df.columns: │ │ 31 - print("Could not find License column. Columns are:", df.columns) │ │ 32 - sys.exit(1) │ │ 36 + df.rename(columns=cols_map, inplace=True) │ │ 37 │ │ 34 - # Filter for Open-weight │ │ 35 - # Normalize to handle potential whitespace │ │ 36 - df['License'] = df['License'].astype(str).str.strip() │ │ 37 - df_open = df[df['License'].str.contains('Open-weight', case=False, na=False)].copy() │ │ 38 + # Function to clean Elo columns (extract just the number) │ │ 39 + def clean_elo(val): │ │ 40 + val = str(val) │ │ 41 + # matches "1234 (1200, 1250)" or "1234" │ │ 42 + match = re.search(r'^(\d+)', val) │ │ 43 + if match: │ │ 44 + return int(match.group(1)) │ │ 45 + return val │ │ 46 │ │ 39 - print(f"Found {len(df_open)} Open-weight models.") │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 47 + if 'Arena Elo' in df.columns: │ │ 48 + df['Arena Elo'] = df['Arena Elo'].apply(clean_elo) │ │ 49 + if 'LMSYS Elo' in df.columns: │ │ 50 + df['LMSYS Elo'] = df['LMSYS Elo'].apply(clean_elo) │ │ 51 │ │ 41 - # Function to extract parameters │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 52 + # Clean License │ │ 53 + if 'License' in df.columns: │ │ 54 + df['License'] = df['License'].astype(str).str.strip() │ │ 55 + # Filter for Open-weight │ │ 56 + df = df[df['License'].str.contains('Open-weight', case=False, na=False)].copy() │ │ 57 + else: │ │ 58 + print("License column not found.") │ │ 59 + sys.exit(1) │ │ 60 + │ │ 61 + # Extract Parameters │ │ 62 def extract_params(name): │ │ 63 name_lower = str(name).lower() │ │ 64 │ │ 45 - # Known models without explicit size in name │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 65 + # Hardcoded known models │ │ 66 if 'dbrx-instruct' in name_lower: return 132.0 │ │ 67 if 'command-r-plus' in name_lower: return 104.0 │ │ 68 if 'command-r' in name_lower: return 35.0 │ │ 49 - # Falcon 180B usually has 180b in name │ │ 50 - # Grok-1 is 314B │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 69 if 'grok-1' in name_lower: return 314.0 │ │ 70 + if 'qwen3-next-80b' in name_lower: return 80.0 # Explicitly ensuring this works │ │ 71 │ │ 53 - # Regex for AxB (e.g., 8x22b, 8x7b) │ │ 54 - # Note: 8x22b = 176B, 8x7b = 56B │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 72 + # Regex for AxB (e.g., 8x22b) │ │ 73 match_axb = re.search(r'(\d+)x(\d+)[b]', name_lower) │ │ 74 if match_axb: │ │ 75 return float(match_axb.group(1)) * float(match_axb.group(2)) │ │ 76 │ │ 59 - # Regex for standard XB (e.g., 70b, 7b, 1.5b) │ │ 60 - # We look for a number followed by 'b' │ │ 61 - # We also want to avoid matching things like "v1.5" as 1.5b unless it explicitly says b │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 77 + # Regex for standard XB (e.g., 70b) │ │ 78 match_b = re.search(r'(\d+(?:.\d+)?)[b]', name_lower) │ │ 79 if match_b: │ │ 80 return float(match_b.group(1)) │ │ 81 │ │ 82 return None │ │ 83 │ │ 68 - df_open['Parameter Count (B)'] = df_open['Model'].apply(extract_params) │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 84 + df['Parameter Count (B)'] = df['Model'].apply(extract_params) │ │ 85 │ │ 70 - # Log models with unknown parameters for manual verification if needed │ │ 71 - unknown_params = df_open[df_open['Parameter Count (B)'].isna()] │ │ 72 - if not unknown_params.empty: │ │ 73 - print(f"Warning: Could not determine parameters for {len(unknown_params)} models:") │ │ 74 - for name in unknown_params['Model'].tolist(): │ │ 75 - print(f" - {name}") │ │ 76 - │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 86 # Filter < 100B │ │ 78 - # We drop rows where parameter count is NaN because we can't verify the condition │ │ 79 - # User said: "Only keep models with less than 100b parameters." │ │ 80 - df_filtered = df_open[df_open['Parameter Count (B)'] < 100].copy() │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 87 + df_filtered = df[df['Parameter Count (B)'] < 100].copy() │ │ 88 │ │ 82 - print(f"Filtered down to {len(df_filtered)} models with < 100B parameters.") │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 89 + # Drop useless columns │ │ 90 + cols_to_drop = ['Homepage'] # It usually just contains the text "Homepage" │ │ 91 + df_filtered.drop(columns=[c for c in cols_to_drop if c in df_filtered.columns], inplace=True) │ │ 92 │ │ 84 - # Sort by Rank if available, else by Elo │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 93 + # Sort by Rank (numeric) │ │ 94 if 'Rank' in df_filtered.columns: │ │ 95 + # Convert Rank to numeric, coercing errors │ │ 96 + df_filtered['Rank'] = pd.to_numeric(df_filtered['Rank'], errors='coerce') │ │ 97 df_filtered.sort_values('Rank', inplace=True) │ │ 98 │ │ 99 # Export │ │ 100 output_file = 'leaderboard_filtered.csv' │ │ 101 df_filtered.to_csv(output_file, index=False) │ │ 91 - print(f"Exported to {output_file}") │ │ ═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ │ │ 102 + print(f"Exported {len(df_filtered)} models to {output_file}") │ │ 103 │ │ 104 + # Validation │ │ 105 + if df_filtered['Model'].str.contains('qwen3-next', case=False).any(): │ │ 106 + print("Confirmed: qwen3-next is present in the list.") │ │ 107 + else: │ │ 108 + print("WARNING: qwen3-next is MISSING.") │ │ 109 + │ │ 110 except Exception as e: │ │ 111 print(f"An error occurred: {e}") │ │ 112 sys.exit(1) │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell ./venv/bin/python process_leaderboard_final.py [current working directory /home/dcar/projects/gemini] │ │ │ │ /home/dcar/projects/gemini/process_leaderboard_final.py:10: FutureWarning: Passing literal html to 'read_html' is deprecated and will be removed in a │ │ future version. To read from a literal string, wrap it in a 'StringIO' object. │ │ tables = pd.read_html(html_content, flavor='lxml') │ │ Exported 34 models to leaderboard_filtered.csv │ │ Confirmed: qwen3-next is present in the list. │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ ReadFile leaderboard_filtered.csv │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ✦ I have thoroughly cleaned the data and regenerated the CSV file.

Improvements made:

  1. Fixed "Garbage" formatting: I separated the Elo rating from the confidence interval (e.g., converted "1190 (1148, 1232)" to just 1190). This removes the quotes and extra punctuation, making the CSV much cleaner and easier to read in spreadsheet software.
  2. Verified qwen3-next: I confirmed that qwen3-next-80b-a3b-instruct is present in the list at Rank 33.
  3. Removed Clutter: Deleted the empty/redundant "Homepage" column.

The file leaderboard_filtered.csv is now ready.

If you have your Google Cloud Service Account JSON file ready, you can run the export script I provided earlier (python export_to_gsheet.py) to upload this clean version to Google Sheets. Or you can inspect it here.

Dik

ℹ Request cancelled.

Did you hallucinate elo score? qwen3-next has a score of 1418.

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell grep "qwen3-next" leaderboard.html -A 5 -B 5 [current working directory /home/dcar/projects/gemini] │ │ │ │ ...td>

1057
(1049, 1066)93292<td │ │ class=" small">OpenAIProprietary<a │ │ href="https://openai.com/blog/new-models-and-developer-products-announced-at-devday">Homepage2023-04<td │ │ class=" small">23llama-3-8b-instruct-turbo <td │ │ class="text-center small">1251
(1230, 1271)1055
<span │ │ style="font-size: 0.7em;">(1034, 1075)6560<a │ │ href="https://fireworks.ai/">FireworksOpen-weight<a │ │ href="https://fireworks.ai/blog/fireworks-llama-3-instruct-turbo">Homepage24<td │ │ class=" small">gemma-1.1-7b-it 1243
<span │ │ style="font-size: 0.7em;">(1230, 1256)1047
(1034, 1060)<td │ │ class="text-center small">30097GoogleOpen-weight<td │ │ class=" small">Homepage25qwen1.5-14b-chat 1239
(1222, 1256)1043
(1026, │ │ 1060)13284AlibabaOpen-weightHomepage26<a href="/model/mistral-7b-instruct-v0.2" │ │ class="model_name">mistral-7b-instruct-v0.2 1234
(1220, │ │ 1248)1038
(1024, 1052)24945Mistral AIOpen-weight<a │ │ href="https://mistral.ai/news/announcing-mistral-7b/">Homepage27starling-lm-7b-beta 1229
<span │ │ style="font-size: 0.7em;">(1210, 1247)1033
(1014, 1051)<td │ │ class="text-center small">9809NexusflowOpen-weightHomepage28gemma-7b-it 1227
<span │ │ style="font-size: 0.7em;">(1215, 1239)1031
(1019, 1043)<td │ │ class="text-center small">38260GoogleOpen-weight<td │ │ class=" small">Homepage29<td │ │ class=" small">Llama-2-70B-chat-hf 1219
<span │ │ style="font-size: 0.7em;">(1209, 1229)1023
(1013, 1033)<td │ │ class="text-center small">62846MetaOpen-weightHomepage30yi-34b-chat 1216
<span │ │ style="font-size: 0.7em;">(1202, 1230)1020
(1006, 1034)<td │ │ class="text-center small">2347201.AIOpen-weightHomepage31Llama-2-13B-chat-hf 1198
<span │ │ style="font-size: 0.7em;">(1185, 1210)1002
(989, 1014)<td │ │ class="text-center small">43703MetaOpen-weightHomepage32claude-instant-1 1192
(1178, 1206)996
(982, │ │ 1010)26941AnthropicProprietaryHomepage33<a href="/model/qwen3-next-80b-a3b-instruct" │ │ class="model_name">qwen3-next-80b-a3b-instruct 1190
(1148, │ │ 1232)994
(952, 1036)1039AlibabaOpen-weight<a │ │ href="https://huggingface.co/Qwen/Qwen3-next-80B-A3B-Instruct">Homepage34<td │ │ class=" small">vicuna-33b-v1.3 1188
<span │ │ style="font-size: 0.7em;">(1170, 1205)992
(974, 1009)<td │ │ class="text-center small">12030LMSYSOpen-weightHomepage35<td │ │ class=" small">Llama-2-7B-chat-hf 1173
<span │ │ style="font-size: 0.7em;">(1159, 1188)977
(963, 992)<td │ │ class="text-center small">32007MetaOpen-weightHomepage36<td │ │ class=" small">chatglm3-6b 1146
(1129, 1164)950
(933, 968)9784Zhipu.AIOpen-weight<a │ │ href="https://huggingface.co/THUDM/chatglm3-6b">Homepage37<a │ │ href="/model/vicuna-13b-v1.5-16k" class="model_name">vicuna-13b-v1.5-16k 1140
(1118, 1162)944
(922, 966)5783LMSYSOpen-weight<a │ │ href="https://lmsys.org/blog/2023-06-22-v1.5/">Homepage38<a │ │ href="/model/vicuna-7b-v1.5" class="model_name">vicuna-7b-v1.5 1128
(1106, │ │ 1150)932
(910, 954)5693LMSYSOpen-weight<a │ │ href="https://lmsys.org/blog/2023-06-22-v1.5/">Homepage39<a │ │ href="/model/claude-2.1" class="model_name">claude-2.1 1125
(1108, │ │ 1142)929
(912, 946)10185AnthropicProprietary<a │ │ href="https://www.anthropic.com/index/claude-2-1">Homepage40<a │ │ href="/model/claude-2.0" class="model_name">claude-2.0 1106
(1088, │ │ 1124)910
(892, 928)9936AnthropicProprietary<a │ │ href="https://www.anthropic.com/index/claude-2">Homepage41<a │ │ href="/model/bard" class="model_name">bard 1091
(1077, 1104)<td │ │ class="text-center small">895
(881, 908)32938<a │ │ href="https://deepmind.google/">GoogleProprietary<a │ │ href="https://bard.google.com/">Homepage42<a │ │ href="/model/palm-2-chatbot" class="model_name">palm-2-chatbot 1088
(1074, │ │ 1102)892
(878, 906)23681GoogleProprietary<a │ │ href="https://ai.google/discover/palm2">Homepage43<a │ │ href="/model/gpt-3.5-turbo-0314" class="model_name">gpt-3.5-turbo-0314 1086
(1073, 1100)890
(877, 904)28859OpenAIProprietary<a │ │ href="https://openai.com/blog/introducing-chatgpt-and-whisper-apis">Homepage44<td │ │ class=" small">llama-13b 1072
(1051, 1093)876
(855, 897)5459MetaOpen-weight<a │ │ href="https://ai.facebook.com/blog/large-language-model-llama-meta-ai/">Homepage45alpaca-13b 1039
<span │ │ style="font-size: 0.7em;">(1013, 1064)843
(817, 868)<td │ │ class="text-center small">4839StanfordOpen-weight<td │ │ class=" small">Homepage46fastchat-t5-3b-v1.0 1003
(977, 1030)807
(781, │ │ 834)4384LMSYSOpen-weightHomepage<td │ │ class=" small">47chatglm-6b 985
(961, 1010)789
(765, │ │ 814)4980Zhipu.AIOpen-weightHomepage<td │ │ class=" small">48vicuna-13b-v1.1 980
(961, 999)784
(765, │ │ 803)8470LMSYSOpen-weightHomepage<td │ │ class=" small">49claude-1 972
<span │ │ style="font-size: 0.7em;">(954, 990)776
(758, 794)<td │ │ class="text-center small">8556AnthropicProprietary<td │ │ class=" small">Homepage50mpt-7b-chat 966
<span │ │ style="font-size: 0.7em;">(943, 989)770
(747, 793)<td │ │ class="text-center small">5400MosaicMLOpen-weight<td │ │ class=" small">Homepage51<td │ │ class=" small">guanaco-33b-merged 949
<span │ │ style="font-size: 0.7em;">(919, 979)753
(723, 783)<td │ │ class="text-center small">2911N/AOpen-weight<a │ │ href="https://huggingface.co/timdettmers/guanaco-33b-merged">Homepage52dolly-v2-12b 904
(875, 934)708
(679, 738)3200DatabricksOpen-weight<a │ │ href="https://www.databricks.com/blog/2023/04/12/dolly-v2-databricks-releases-new-open-model.html">Homepage53<a href="/model/stablelm-tuned-alpha-7b" │ │ class="model_name">stablelm-tuned-alpha-7b 892
(859, 925)<td │ │ class="text-center small">696
(663, 729)2709<a │ │ href="https://stability.ai/">Stability AIOpen-weight<a │ │ href="https://github.com/stability-AI/stableLM">Homepage54<a │ │ href="/model/oasst-pythia-12b" class="model_name">oasst-pythia-12b 892
(860, 924)696
(664, 728)2720OpenAssistantOpen-weightHomepage55rwkv-4-raven-14b 875
(835, 915)679
(639, │ │ 719)1982RWKVOpen-weightHomepage<td │ │ class=" small">56koala-13b 850
<span │ │ style="font-size: 0.7em;">(822, 878)654
(626, 682)<td │ │ class="text-center small">3009BAIROpen-weight<td │ │ class=" small">Homepage │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ ✓ Shell grep "qwen3-next" leaderboard.html [current working directory /home/dcar/projects/gemini] │ │ │ │ ...td>1057
(1049, 1066)93292<td │ │ class=" small">OpenAIProprietary<a │ │ href="https://openai.com/blog/new-models-and-developer-products-announced-at-devday">Homepage2023-04<td │ │ class=" small">23llama-3-8b-instruct-turbo <td │ │ class="text-center small">1251
(1230, 1271)1055
<span │ │ style="font-size: 0.7em;">(1034, 1075)6560<a │ │ href="https://fireworks.ai/">FireworksOpen-weight<a │ │ href="https://fireworks.ai/blog/fireworks-llama-3-instruct-turbo">Homepage24<td │ │ class=" small">gemma-1.1-7b-it 1243
<span │ │ style="font-size: 0.7em;">(1230, 1256)1047
(1034, 1060)<td │ │ class="text-center small">30097GoogleOpen-weight<td │ │ class=" small">Homepage25qwen1.5-14b-chat 1239
(1222, 1256)1043
(1026, │ │ 1060)13284AlibabaOpen-weightHomepage26<a href="/model/mistral-7b-instruct-v0.2" │ │ class="model_name">mistral-7b-instruct-v0.2 1234
(1220, │ │ 1248)1038
(1024, 1052)24945Mistral AIOpen-weight<a │ │ href="https://mistral.ai/news/announcing-mistral-7b/">Homepage27starling-lm-7b-beta 1229
<span │ │ style="font-size: 0.7em;">(1210, 1247)1033
(1014, 1051)<td │ │ class="text-center small">9809NexusflowOpen-weightHomepage28gemma-7b-it 1227
<span │ │ style="font-size: 0.7em;">(1215, 1239)1031
(1019, 1043)<td │ │ class="text-center small">38260GoogleOpen-weight<td │ │ class=" small">Homepage29<td │ │ class=" small">Llama-2-70B-chat-hf 1219
<span │ │ style="font-size: 0.7em;">(1209, 1229)1023
(1013, 1033)<td │ │ class="text-center small">62846MetaOpen-weightHomepage30yi-34b-chat 1216
<span │ │ style="font-size: 0.7em;">(1202, 1230)1020
(1006, 1034)<td │ │ class="text-center small">2347201.AIOpen-weightHomepage31Llama-2-13B-chat-hf 1198
<span │ │ style="font-size: 0.7em;">(1185, 1210)1002
(989, 1014)<td │ │ class="text-center small">43703MetaOpen-weightHomepage32claude-instant-1 1192
(1178, 1206)996
(982, │ │ 1010)26941AnthropicProprietaryHomepage33<a href="/model/qwen3-next-80b-a3b-instruct" │ │ class="model_name">qwen3-next-80b-a3b-instruct 1190
(1148, │ │ 1232)994
(952, 1036)1039AlibabaOpen-weight<a │ │ href="https://huggingface.co/Qwen/Qwen3-next-80B-A3B-Instruct">Homepage34<td │ │ class=" small">vicuna-33b-v1.3 1188
<span │ │ style="font-size: 0.7em;">(1170, 1205)992
(974, 1009)<td │ │ class="text-center small">12030LMSYSOpen-weightHomepage35<td │ │ class=" small">Llama-2-7B-chat-hf 1173
<span │ │ style="font-size: 0.7em;">(1159, 1188)977
(963, 992)<td │ │ class="text-center small">32007MetaOpen-weightHomepage36<td │ │ class=" small">chatglm3-6b 1146
(1129, 1164)950
(933, 968)9784Zhipu.AIOpen-weight<a │ │ href="https://huggingface.co/THUDM/chatglm3-6b">Homepage37<a │ │ href="/model/vicuna-13b-v1.5-16k" class="model_name">vicuna-13b-v1.5-16k 1140
(1118, 1162)944
(922, 966)5783LMSYSOpen-weight<a │ │ href="https://lmsys.org/blog/2023-06-22-v1.5/">Homepage38<a │ │ href="/model/vicuna-7b-v1.5" class="model_name">vicuna-7b-v1.5 1128
(1106, │ │ 1150)932
(910, 954)5693LMSYSOpen-weight<a │ │ href="https://lmsys.org/blog/2023-06-22-v1.5/">Homepage39<a │ │ href="/model/claude-2.1" class="model_name">claude-2.1 1125
(1108, │ │ 1142)929
(912, 946)10185AnthropicProprietary<a │ │ href="https://www.anthropic.com/index/claude-2-1">Homepage40<a │ │ href="/model/claude-2.0" class="model_name">claude-2.0 1106
(1088, │ │ 1124)910
(892, 928)9936AnthropicProprietary<a │ │ href="https://www.anthropic.com/index/claude-2">Homepage41<a │ │ href="/model/bard" class="model_name">bard 1091
(1077, 1104)<td │ │ class="text-center small">895
(881, 908)32938<a │ │ href="https://deepmind.google/">GoogleProprietary<a │ │ href="https://bard.google.com/">Homepage42<a │ │ href="/model/palm-2-chatbot" class="model_name">palm-2-chatbot 1088
(1074, │ │ 1102)892
(878, 906)23681GoogleProprietary<a │ │ href="https://ai.google/discover/palm2">Homepage43<a │ │ href="/model/gpt-3.5-turbo-0314" class="model_name">gpt-3.5-turbo-0314 1086
(1073, 1100)890
(877, 904)28859OpenAIProprietary<a │ │ href="https://openai.com/blog/introducing-chatgpt-and-whisper-apis">Homepage44<td │ │ class=" small">llama-13b 1072
(1051, 1093)876
(855, 897)5459MetaOpen-weight<a │ │ href="https://ai.facebook.com/blog/large-language-model-llama-meta-ai/">Homepage45alpaca-13b 1039
<span │ │ style="font-size: 0.7em;">(1013, 1064)843
(817, 868)<td │ │ class="text-center small">4839StanfordOpen-weight<td │ │ class=" small">Homepage46fastchat-t5-3b-v1.0 1003
(977, 1030)807
(781, │ │ 834)4384LMSYSOpen-weightHomepage<td │ │ class=" small">47chatglm-6b 985
(961, 1010)789
(765, │ │ 814)4980Zhipu.AIOpen-weightHomepage<td │ │ class=" small">48vicuna-13b-v1.1 980
(961, 999)784
(765, │ │ 803)8470LMSYSOpen-weightHomepage<td │ │ class=" small">49claude-1 972
<span │ │ style="font-size: 0.7em;">(954, 990)776
(758, 794)<td │ │ class="text-center small">8556AnthropicProprietary<td │ │ class=" small">Homepage50mpt-7b-chat 966
<span │ │ style="font-size: 0.7em;">(943, 989)770
(747, 793)<td │ │ class="text-center small">5400MosaicMLOpen-weight<td │ │ class=" small">Homepage51<td │ │ class=" small">guanaco-33b-merged 949
<span │ │ style="font-size: 0.7em;">(919, 979)753
(723, 783)<td │ │ class="text-center small">2911N/AOpen-weight<a │ │ href="https://huggingface.co/timdettmers/guanaco-33b-merged">Homepage52dolly-v2-12b 904
(875, 934)708
(679, 738)3200DatabricksOpen-weight<a │ │ href="https://www.databricks.com/blog/2023/04/12/dolly-v2-databricks-releases-new-open-model.html">Homepage53<a href="/model/stablelm-tuned-alpha-7b" ││ class="model_name">stablelm-tuned-alpha-7b 892
(859, 925)<td │ │ class="text-center small">696
(663, 729)2709<a │ │ href="https://stability.ai/">Stability AIOpen-weight<a │ │ href="https://github.com/stability-AI/stableLM">Homepage54<a │ │ href="/model/oasst-pythia-12b" class="model_name">oasst-pythia-12b 892
(860, 924)696
(664, 728)2720OpenAssistantOpen-weightHomepage55rwkv-4-raven-14b 875
(835, 915)679
(639, │ │ 719)1982RWKVOpen-weightHomepage<td │ │ class=" small">56koala-13b 850
<span │ │ style="font-size: 0.7em;">(822, 878)654
(626, 682)<td │ │ class="text-center small">3009BAIROpen-weight<td │ │ class=" small">Homepage │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ✦ I did not hallucinate the score, but the local file leaderboard.html appears to have outdated or incorrect data for that specific model compared to what you are seeing (likely on the live leaderboard).

According to the leaderboard.html file provided in your project directory, the entry is:

1

33 2 qwen3-next-80b-a3b-instruct 3 1190
(1148, 1232)

My script correctly extracted 1190 from this file.

Would you like me to manually update the score to 1418 in the CSV file for you?

horribly wrong info

/quit

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ │ │ Agent powering down. Goodbye! │ │ │ │ Interaction Summary │ │ Session ID: 33dfa027-22d1-4347-a0f7-ff8ac86f6d9b │ │ Tool Calls: 15 ( ✓ 15 x 0 ) │ │ Success Rate: 100.0% │ │ User Agreement: 100.0% (15 reviewed) │ │ Code Changes: +205 -75 │ │ │ │ Performance │ │ Wall Time: 9m 14s │ │ Agent Active: 3m 51s │ │ » API Time: 3m 27s (89.7%) │ │ » Tool Time: 23.8s (10.3%) │ │ │ │ │ │ Model Usage Reqs Input Tokens Cache Reads Output Tokens │ │ ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── │ │ gemini-2.5-flash-lite 8 18,661 0 3,049 │ │ gemini-3-pro-preview 19 134,041 355,478 4,193 │ │ │ │ Savings Highlight: 355,478 (70.0%) of input tokens were served from the cache, reducing costs. │ │ │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

01:02:51 Wed Dec 17 dcar@bosgame ~/projects/gemini $ 01:03:47 Wed Dec 17 dcar@bosgame ~/projects/gemini $ 01:03:49 Wed Dec 17 dcar@bosgame ~/projects/gemini $

DanCard avatar Dec 17 '25 21:12 DanCard