ClassEval icon indicating copy to clipboard operation
ClassEval copied to clipboard

Benchmark ClassEval for class-level code generation.

Results 8 ClassEval issues
Sort by recently updated
recently updated
newest added

This is a good benchmark, thank you for that. Do you plan to add modern models like Opus, llama-3, granite, codeqwen1.5-chat and so on to the benchmark?

Hi, Thank you very much for sharing this benchmark and all the hard work! I have a question regarding the test passing rates on the generated code. I followed the...

The following classnames did not map directly to benchmark code and test file names and required the below mapping: ```python SOLUTION_MAP = { "BankAccount": "Bank_Account_System", "Classroom": "ClassroomManagement", "ClassRegistrationSystem": "Class_Registration_System", "DatabaseProcessor":...

I was able to achieve 100% with the benchmark code after disabling the following tests and patching test 56. Some of the tasks have an explicit test case while others...

Question about Dataset Labels Hi, I have a question regarding the labels in the dataset. Does this dataset include labels for the following topics? - Management Systems - Data Formatting...

In AccessGatewayFilter skeleton, there is this method: ``` python def is_start_with(self, request_uri): """ Check if the request URI starts with certain prefixes. :param request_uri: str, the URI of the request...

``` def test_interpret_6(self): context = AutomaticGuitarSimulator(" ") play_list = context.interpret() self.assertEqual(play_list, [{'Chord': '', 'Tune': ''}, {'Chord': '', 'Tune': ''}]) # better to remove this test ``` ``` def test_interpret_9(self): context...

I'm using this command to evaluate Pass@1: ``` $ python evaluation.py --source_file_name GPT-4-Turbo_class_H_greedy --eval_data ClassEval_data --greedy 1 ``` ``` { 'class_partial_success': 0.58 'class_success': 0.37 'fun_partial_success': 0.8047808764940239 'fun_success': 0.6613545816733067 } ```...