ScandEval icon indicating copy to clipboard operation
ScandEval copied to clipboard

[FEATURE REQUEST] Better exports on the website.

Open tollefj opened this issue 1 year ago • 2 comments

🚀 The feature, motivation and pitch

I'm working on a project where we compare several metrics from the leaderboards. However, we're interested in many values placed in the right-hand split, e.g., correlation coefficient vs accuracy for Hellaswag. Currently, only the left-hand side is returned through the CSV export.

I have made a simple scraping tool to get around this, and when I find the time, I can gladly contribute to allow for a more flexible export system on the website directly.

In the meantime, here's a "solution":

import requests
from bs4 import BeautifulSoup

id = "norwegian-nlg"
url = f"https://scandeval.com/{id}/"
res = requests.get(url)
soup = BeautifulSoup(res.content, "html.parser")
table = soup.find("table", {"id": {id}})
data = []

headers = []
for th in table.find("thead").find_all("th"):
    headers.append(th.text.strip())
rows = table.find("tbody").find_all("tr")


SELECTED_MODELS = [
    "meta-llama/meta-llama-3-8b-instruct",
    "google/gemma-2-2b-it",
]

parsable_metrics = ["NorNE-nb", "NorNE-nn"]
def parse_metric(metric, value):
    left, right = value.split("/")
    # VALUE ± STDDEV
    lval, lstd = left.split("±")
    rval, rstd = right.split("±")
    lval = float(lval.strip())
    rval = float(rval.strip())
    lstd = float(lstd.strip())
    rstd = float(rstd.strip())

    match metric:
        case "NorNE-nb":
            return rval  # micro-avg F1 with MISC tags
        case "NorNE-nn":
            return rval  # micro-avg F1 with MISC tags


# skip Model (fist header), we do that manually :-)
headers = [th.get_text(strip=True) for th in table.find_all("th")][1:]
# skip versions, if desired:
headers = [h for h in headers if not "version" in h.lower()]


for row in rows:
    row_data = {}
    cells = row.find_all("td")
    model_id = cells[0].text.strip()
    # handle the <modelname> (few-shot) naming scheme)
    model_id = model_id.split("(")[0].strip()

    if model_id.lower() not in SELECTED_MODELS:
        continue

    row_data["model"] = model_id
    for i, col in enumerate(headers):
        value = cells[i + 1].text.strip()  # +1 as we added model already
        if col in parsable_metrics:
            row_data[col] = parse_metric(col, value)
        else:
            # you can add in other non-parsable metrics here/versions etc.
            continue

    data.append(row_data)

for model_result in data:
    print(model_result)

which yields:

{'model': 'meta-llama/Meta-Llama-3-8B-Instruct', 'NorNE-nb': 65.57, 'NorNE-nn': 65.44}
{'model': 'google/gemma-2-2b-it', 'NorNE-nb': 28.77, 'NorNE-nn': 30.27}

Alternatives

No response

Additional context

No response

tollefj avatar Oct 09 '24 14:10 tollefj

Thanks for your feedback!

Is it correctly understood that you're asking for both the primary metric score and the secondary metric score to be included in the CSV export?

saattrupdan avatar Oct 10 '24 06:10 saattrupdan

That would be a nice first step, and perhaps expand the exports with more functionality if that’s something people would want :-)

no rush at all, though

tollefj avatar Oct 10 '24 07:10 tollefj