OpenHands
                                
                                
                                
                                    OpenHands copied to clipboard
                            
                            
                            
                        Automate regression test checking
What problem or use case are you trying to solve?
We can run the regression tests with ./evaluation/regression/run.sh. But it's hard to tell how each agent does, and if it accomplishes the task.
Describe the UX of the solution you'd like I'd like to see a score of how many tests it accomplished successfully
Do you have thoughts on the technical implementation?
We should add a test.sh to each test case, and expect it to exit 0.
Each agent should then get a score of how many tests it passed.
Describe alternatives you've considered
Additional context