continuous-eval
continuous-eval copied to clipboard
Implement SQL AST comparison metric
Pull Request Description
Summary
This pull request introduces a new SQL AST comparison metric to the continuous-eval
repository. The new metric, SQLASTSimilarity
, compares SQL queries using Abstract Syntax Tree (AST) similarity, leveraging the sqlglot
library.
Changes
- Added the
SQLASTSimilarity
class to thecode_deterministic_metrics.py
file. - Imported the
diff
andparse_one
functions from thesqlglot
library. - Imported the
Keep
class from thesqlglot.diff
module. - Implemented the
__call__
method in theSQLASTSimilarity
class to parse SQL queries into ASTs and calculate similarity scores. - Implemented the
_calculate_similarity
method in theSQLASTSimilarity
class to calculate the similarity score between two ASTs by using thediff
function to get the differences between the trees, counting the total changes, and calculating the total number of nodes in both trees. The similarity score is calculated as1 - (total_changes / total_nodes)
.
Testing
- Created a new test file,
test_code_deterministic_metrics.py
, with unit tests for theSQLASTSimilarity
class. - Added test methods to validate the functionality of the
SQLASTSimilarity
class, including tests for exact match, different queries, similar queries, and invalid queries. - Ran the tests using
pytest
, and all tests passed successfully.
Link to Devin run
https://preview.devin.ai/devin/696032ba45654233968d6a04f2bc5df3
Request for Review
Please review the changes and provide feedback. If everything looks good, kindly approve the pull request for merging.
Thank you!