Add SQL Metrics Implementation
Pull Request Description
Summary
This pull request introduces a new SQL AST comparison metric to the continuous-eval repository. The new metric, SQLASTSimilarity, compares SQL queries using Abstract Syntax Tree (AST) similarity, leveraging the sqlglot library.
Changes
- Added the
SQLASTSimilarityclass to thecode_deterministic_metrics.pyfile. - Imported the
diffandparse_onefunctions from thesqlglotlibrary. - Imported the
Keepclass from thesqlglot.diffmodule. - Implemented the
__call__method in theSQLASTSimilarityclass to parse SQL queries into ASTs and calculate similarity scores. - Implemented the
_calculate_similaritymethod in theSQLASTSimilarityclass to calculate the similarity score between two ASTs by using thedifffunction to get the differences between the trees, counting the total changes, and calculating the total number of nodes in both trees. The similarity score is calculated as1 - (total_changes / total_nodes).
Testing
- Created a new test file,
test_code_deterministic_metrics.py, with unit tests for theSQLASTSimilarityclass. - Added test methods to validate the functionality of the
SQLASTSimilarityclass, including tests for exact match, different queries, similar queries, and invalid queries. - Ran the tests using
pytest, and all tests passed successfully.
Link to Devin run
https://preview.devin.ai/devin/696032ba45654233968d6a04f2bc5df3
Request for Review
Please review the changes and provide feedback. If everything looks good, kindly approve the pull request for merging.
Thank you!
This is a cross repository pull request, but Ellipsis isn't installed in yisz/continuous-eval-locale. In order to have Ellipsis address comments in this PR, you'll need to install Ellipsis in that repository.
@pantonante check to see if Devin's work is good enough. It added tests / documentation as well.