ML-Bench
ML-Bench copied to clipboard

Published 20 hours ago •

→

Metadata

The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)

Reame
Issues

Results 2 ML-Bench issues

Sort by recently updated

The GPT3.5 result could not be reproduced

I found that two parameters in script/run.sh were running incorrectly, where the type="quarter" parameter was not defined or used in query_gpt.py. The instructions="extend_instructions" parameter also returns KeyError: 'extend_instructions' How do...

iiinsight

Fix typo and TOC

cz3k

About

The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)

code-generation

codegeneration

llm

gpt-4

351

Stars

Forks

Watchers

Owner

gersteinlab

← Metadata

351

Stars

Forks

Watchers

Owner

gersteinlab

Metadata

The Official Repo of ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code (https://arxiv.org/abs/2311.09835)

Back

ML-Bench ML-Bench copied to clipboard

Metadata

The GPT3.5 result could not be reproduced

Fix typo and TOC

← Metadata

Owner

Metadata

ML-Bench
ML-Bench copied to clipboard