[FEATURE REQUEST] Better exports on the website.
🚀 The feature, motivation and pitch
I'm working on a project where we compare several metrics from the leaderboards. However, we're interested in many values placed in the right-hand split, e.g., correlation coefficient vs accuracy for Hellaswag. Currently, only the left-hand side is returned through the CSV export.
I have made a simple scraping tool to get around this, and when I find the time, I can gladly contribute to allow for a more flexible export system on the website directly.
In the meantime, here's a "solution":
import requests
from bs4 import BeautifulSoup
id = "norwegian-nlg"
url = f"https://scandeval.com/{id}/"
res = requests.get(url)
soup = BeautifulSoup(res.content, "html.parser")
table = soup.find("table", {"id": {id}})
data = []
headers = []
for th in table.find("thead").find_all("th"):
headers.append(th.text.strip())
rows = table.find("tbody").find_all("tr")
SELECTED_MODELS = [
"meta-llama/meta-llama-3-8b-instruct",
"google/gemma-2-2b-it",
]
parsable_metrics = ["NorNE-nb", "NorNE-nn"]
def parse_metric(metric, value):
left, right = value.split("/")
# VALUE ± STDDEV
lval, lstd = left.split("±")
rval, rstd = right.split("±")
lval = float(lval.strip())
rval = float(rval.strip())
lstd = float(lstd.strip())
rstd = float(rstd.strip())
match metric:
case "NorNE-nb":
return rval # micro-avg F1 with MISC tags
case "NorNE-nn":
return rval # micro-avg F1 with MISC tags
# skip Model (fist header), we do that manually :-)
headers = [th.get_text(strip=True) for th in table.find_all("th")][1:]
# skip versions, if desired:
headers = [h for h in headers if not "version" in h.lower()]
for row in rows:
row_data = {}
cells = row.find_all("td")
model_id = cells[0].text.strip()
# handle the <modelname> (few-shot) naming scheme)
model_id = model_id.split("(")[0].strip()
if model_id.lower() not in SELECTED_MODELS:
continue
row_data["model"] = model_id
for i, col in enumerate(headers):
value = cells[i + 1].text.strip() # +1 as we added model already
if col in parsable_metrics:
row_data[col] = parse_metric(col, value)
else:
# you can add in other non-parsable metrics here/versions etc.
continue
data.append(row_data)
for model_result in data:
print(model_result)
which yields:
{'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'NorNE-nb': 65.57, 'NorNE-nn': 65.44}
{'model': 'google/gemma-2-2b-it', 'NorNE-nb': 28.77, 'NorNE-nn': 30.27}
Alternatives
No response
Additional context
No response
Thanks for your feedback!
Is it correctly understood that you're asking for both the primary metric score and the secondary metric score to be included in the CSV export?
That would be a nice first step, and perhaps expand the exports with more functionality if that’s something people would want :-)
no rush at all, though