ClassEval issues

Do you plan to update the benchmark?

1

This is a good benchmark, thank you for that. Do you plan to add modern models like Opus, llama-3, granite, codeqwen1.5-chat and so on to the benchmark?

rodion-m

Lower test passing rates compared to original results.

1

Hi, Thank you very much for sharing this benchmark and all the hard work! I have a question regarding the test passing rates on the generated code. I followed the...

holen-zhang

Request: Standardize solution code and test filenames to class names.

The following classnames did not map directly to benchmark code and test file names and required the below mapping: ```python SOLUTION_MAP = { "BankAccount": "Bank_Account_System", "Classroom": "ClassroomManagement", "ClassRegistrationSystem": "Class_Registration_System", "DatabaseProcessor":...

twitchy0

Benchmark Solution Code does not achieve 100%

I was able to achieve 100% with the benchmark code after disabling the following tests and patching test 56. Some of the tasks have an explicit test case while others...

twitchy0

Does the Dataset Include These Topic Labels?

3

Question about Dataset Labels Hi, I have a question regarding the labels in the dataset. Does this dataset include labels for the following topics? - Management Systems - Data Formatting...

ChoongTuckWai

AccessGatewayFilter question

In AccessGatewayFilter skeleton, there is this method: ``` python def is_start_with(self, request_uri): """ Check if the request URI starts with certain prefixes. :param request_uri: str, the URI of the request...

xingjianll

ClassEval 5 improper assertion in tests

``` def test_interpret_6(self): context = AutomaticGuitarSimulator(" ") play_list = context.interpret() self.assertEqual(play_list, [{'Chord': '', 'Tune': ''}, {'Chord': '', 'Tune': ''}]) # better to remove this test ``` ``` def test_interpret_9(self): context...

doomspec

Pass@1 greedy results are changing whenever I re-evaluate

1

I'm using this command to evaluate Pass@1: ``` $ python evaluation.py --source_file_name GPT-4-Turbo_class_H_greedy --eval_data ClassEval_data --greedy 1 ``` ``` { 'class_partial_success': 0.58 'class_success': 0.37 'fun_partial_success': 0.8047808764940239 'fun_success': 0.6613545816733067 } ```...

sfc-gh-hhan

ClassEval
ClassEval copied to clipboard

Metadata

Do you plan to update the benchmark?

Lower test passing rates compared to original results.

Request: Standardize solution code and test filenames to class names.

Benchmark Solution Code does not achieve 100%

Does the Dataset Include These Topic Labels?

AccessGatewayFilter question

ClassEval 5 improper assertion in tests

Pass@1 greedy results are changing whenever I re-evaluate

← Metadata

Owner

Metadata

ClassEval ClassEval copied to clipboard

Metadata

← Metadata

Owner

Metadata

ClassEval
ClassEval copied to clipboard