ScandEval [FEATURE REQUEST] Better exports on the website.

🚀 The feature, motivation and pitch

I'm working on a project where we compare several metrics from the leaderboards. However, we're interested in many values placed in the right-hand split, e.g., correlation coefficient vs accuracy for Hellaswag. Currently, only the left-hand side is returned through the CSV export.

I have made a simple scraping tool to get around this, and when I find the time, I can gladly contribute to allow for a more flexible export system on the website directly.

In the meantime, here's a "solution":

import requests
from bs4 import BeautifulSoup

id = "norwegian-nlg"
url = f"https://scandeval.com/{id}/"
res = requests.get(url)
soup = BeautifulSoup(res.content, "html.parser")
table = soup.find("table", {"id": {id}})
data = []

headers = []
for th in table.find("thead").find_all("th"):
    headers.append(th.text.strip())
rows = table.find("tbody").find_all("tr")


SELECTED_MODELS = [
    "meta-llama/meta-llama-3-8b-instruct",
    "google/gemma-2-2b-it",
]

parsable_metrics = ["NorNE-nb", "NorNE-nn"]
def parse_metric(metric, value):
    left, right = value.split("/")
    # VALUE ± STDDEV
    lval, lstd = left.split("±")
    rval, rstd = right.split("±")
    lval = float(lval.strip())
    rval = float(rval.strip())
    lstd = float(lstd.strip())
    rstd = float(rstd.strip())

    match metric:
        case "NorNE-nb":
            return rval  # micro-avg F1 with MISC tags
        case "NorNE-nn":
            return rval  # micro-avg F1 with MISC tags


# skip Model (fist header), we do that manually :-)
headers = [th.get_text(strip=True) for th in table.find_all("th")][1:]
# skip versions, if desired:
headers = [h for h in headers if not "version" in h.lower()]


for row in rows:
    row_data = {}
    cells = row.find_all("td")
    model_id = cells[0].text.strip()
    # handle the <modelname> (few-shot) naming scheme)
    model_id = model_id.split("(")[0].strip()

    if model_id.lower() not in SELECTED_MODELS:
        continue

    row_data["model"] = model_id
    for i, col in enumerate(headers):
        value = cells[i + 1].text.strip()  # +1 as we added model already
        if col in parsable_metrics:
            row_data[col] = parse_metric(col, value)
        else:
            # you can add in other non-parsable metrics here/versions etc.
            continue

    data.append(row_data)

for model_result in data:
    print(model_result)

which yields:

{'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'NorNE-nb': 65.57, 'NorNE-nn': 65.44}
{'model': 'google/gemma-2-2b-it', 'NorNE-nb': 28.77, 'NorNE-nn': 30.27}

Alternatives

No response

Additional context

No response

Oct 09 '24 14:10 tollefj

Thanks for your feedback!

Is it correctly understood that you're asking for both the primary metric score and the secondary metric score to be included in the CSV export?

Oct 10 '24 06:10 saattrupdan

That would be a nice first step, and perhaps expand the exports with more functionality if that’s something people would want :-)

no rush at all, though

Oct 10 '24 07:10 tollefj