ClassEval
ClassEval copied to clipboard
Benchmark ClassEval for class-level code generation.
This is a good benchmark, thank you for that. Do you plan to add modern models like Opus, llama-3, granite, codeqwen1.5-chat and so on to the benchmark?
Hi, Thank you very much for sharing this benchmark and all the hard work! I have a question regarding the test passing rates on the generated code. I followed the...
The following classnames did not map directly to benchmark code and test file names and required the below mapping: ```python SOLUTION_MAP = { "BankAccount": "Bank_Account_System", "Classroom": "ClassroomManagement", "ClassRegistrationSystem": "Class_Registration_System", "DatabaseProcessor":...
I was able to achieve 100% with the benchmark code after disabling the following tests and patching test 56. Some of the tasks have an explicit test case while others...
Question about Dataset Labels Hi, I have a question regarding the labels in the dataset. Does this dataset include labels for the following topics? - Management Systems - Data Formatting...
In AccessGatewayFilter skeleton, there is this method: ``` python def is_start_with(self, request_uri): """ Check if the request URI starts with certain prefixes. :param request_uri: str, the URI of the request...
``` def test_interpret_6(self): context = AutomaticGuitarSimulator(" ") play_list = context.interpret() self.assertEqual(play_list, [{'Chord': '', 'Tune': ''}, {'Chord': '', 'Tune': ''}]) # better to remove this test ``` ``` def test_interpret_9(self): context...
I'm using this command to evaluate Pass@1: ``` $ python evaluation.py --source_file_name GPT-4-Turbo_class_H_greedy --eval_data ClassEval_data --greedy 1 ``` ``` { 'class_partial_success': 0.58 'class_success': 0.37 'fun_partial_success': 0.8047808764940239 'fun_success': 0.6613545816733067 } ```...