great_expectations
great_expectations copied to clipboard
[FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set'
[FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set' sqlalchemy dataset.py file
Changes proposed in this pull request:
- Allows users to make use of the pairwise expectation 'expect_column_pair_values_to_be_in_set' for SQLAlchemy backends in v2 GE -Provides the 'column_pair_map_expectation 'decorator for other pairwise expectations to be developed for SQLAlchemy backends in v2 GE
After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.
Definition of Done
Please delete options that are not relevant.
- [x] My code follows the Great Expectations style guide
- [x] I have performed a self-review of my own code
- [x] I have commented my code, particularly in hard-to-understand areas
- [x] I have made corresponding changes to the documentation
- [ ] I have added unit tests where applicable and made sure that new and existing tests are passing.
- [ ] I have run any local integration tests and made sure that nothing is broken.
Deploy request for niobium-lead-7998 pending review.
Visit the deploys page to approve it
Name | Link |
---|---|
Latest commit | 348bba3f328199a83a51737d2d8d1eb6ec91a416 |
Hi @Arnavkar! Thank you for contributing to Great Expectations! I'll be taking a look at this contribution. It looks like the linter is picking up some issues, would you mind running invoke hooks
from the repo root & merging develop into your branch? That will run all of our automated linting.
Hi @Arnavkar! I was able to make commits to your branch for satisfying the linter and merging develop. When you have a moment would you be able to sign the CLA? I'll submit my code review soon as well.
Hi Anthony,
Sorry for the slow response β Thank you for reaching out and making those fixes! I believe I've already submitted the CLA already, that's via the google form correct? Let me know if I should go ahead and submit it once more.
Thank you!
Regards, Arnav
On Thu, Oct 20, 2022 at 11:17 AM Anthony Burdi @.***> wrote:
Hi @Arnavkar https://github.com/Arnavkar! I was able to make commits to your branch for satisfying the linter and merging develop. When you have a moment would you be able to sign the CLA? I'll submit my code review soon as well.
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1285723531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNRHB6YRI2JFMSIHJ2TWEFPBZANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>
@cla-bot check
Hi @Arnavkar! If you don't mind can you also sign with [email protected]? I am just seeing your .edu email address. If that doesn't work we can override.
Right now I'm enabling tests for the new expectation and verifying.
Hi Anthony,
Thanks a ton! Iβll go ahead and submit the form once more with my Morningstar email address, let me know if / when you see it on your end ππΌ
Regards, Arnav
On Fri, Oct 21, 2022 at 12:21 PM Anthony Burdi @.***> wrote:
@.**** approved this pull request.
Thank you for this contribution! I made one small edit to enable the tests for the new expectation and ran the linter, but otherwise LGTM!
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#pullrequestreview-1151233201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNS2LSG5VB7Z2GID443WEK7H3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.*** com>
Hi @Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?
Sure, Iβll take a look later today ππΌ
Regards, Arnav
On Mon, Oct 24, 2022 at 11:29 AM Anthony Burdi @.***> wrote:
Hi @Arnavkar https://github.com/Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1289212019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNUGNF77WTSWD5JKPBDWE2TL3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>
Hi Anthony,
Would you be able to share the command that you're running/ which of the test cases are failing? All the tests on my end seem to be passing successfully. Thank you!
Regards, Arnav
On Mon, Oct 24, 2022 at 12:18 PM Arnav Shirodkar @.***> wrote:
Sure, Iβll take a look later today ππΌ
Regards, Arnav
On Mon, Oct 24, 2022 at 11:29 AM Anthony Burdi @.***> wrote:
Hi @Arnavkar https://github.com/Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1289212019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNUGNF77WTSWD5JKPBDWE2TL3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>
Hi @Arnavkar
Are you able to see this link of our CI pipeline? I'm looking at the details of the automated checks at the bottom of this PR. Specifically the failing dev (db_integration mssql)
stage. No worries if you are not able to view that link. It looks like the following tests are failing for mssql:
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:positive_test_with_nulls_and_ignore_row_if_either_value_is_missing]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:negative_test_with_nulls]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:another_positive_test_with_nulls]
The test command that is being run in that CI pipeline is pytest tests --mssql
. There are a few other directives to ignore other tests - you can use --ignore 'tests/cli' --ignore 'contrib/cli/tests' --ignore 'tests/integration/usage_statistics'
to do the same. You can also just run the failing tests. To run these tests will require setting up an mssql instance, e.g. running docker compose up -d
in the great_expectations/assets/docker/mssql
folder.
Hi Anthony,
I'm trying to run the mssql tests on my end, but I'm constantly unable to connect to the mssql db with the following error message:
pyodbc.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')
Any thoughts?
Arnav
On Tue, Oct 25, 2022 at 10:16 AM Anthony Burdi @.***> wrote:
Hi @Arnavkar https://github.com/Arnavkar Are you able to see this link of our CI pipeline https://dev.azure.com/great-expectations/great_expectations/_build/results?buildId=113376&view=logs&j=5c55b4fa-9ba6-56aa-e1bf-dbaa4c84fec3&t=84b228f2-b0aa-5893-743e-5e39f9043c2d? I'm looking at the details of the automated checks at the bottom of this PR. Specifically the failing dev (db_integration mssql) stage. No worries if you are not able to view that link. It looks like the following tests are failing for mssql:
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls] FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:positive_test_with_nulls_and_ignore_row_if_either_value_is_missing] FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:negative_test_with_nulls] FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:another_positive_test_with_nulls]
The test command that is being run in that CI pipeline is pytest tests --mssql. There are a few other directives to ignore other tests - you can use --ignore 'tests/cli' --ignore 'contrib/cli/tests' --ignore 'tests/integration/usage_statistics' to do the same. You can also just run the failing tests. To run these tests will require setting up an mssql instance, e.g. running docker compose up -d in the great_expectations/assets/docker/mssql folder.
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1290630935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNW6HZYJ2SAABPRW2STWE7TU3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>
@Arnavkar I'm having trouble as well, but my error leads me to think the database is not being created when running the docker compose script. We create the database within our CI pipeline so let me see if I can get it created when creating the docker container so we can debug the failing tests locally. In the worst case, we can skip support for mssql for this expectation if it is too involved and tackle that in a subsequent PR.
@Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci
database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:
- Run
docker compose up -d
in thegreat_expectations/assets/docker/mssql
folder of the repository - Run
/opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;"
in the docker container CLI (you can use a command or docker desktop) - Run one of the failing tests via
pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]"
, replace the test name in quotes after the -k to run other tests.
Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:
E sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)")
E [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition
E FROM [test_data_N74VdOGu]]
E [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)]
E (Background on this error at: https://sqlalche.me/e/14/f405)
../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError
Hey Anthony,
Thanks for all your help with this! Let me give it a second shot tonight and Iβll update on any progress made. Keep you posted ππΌ
Regards, Arnav
On Tue, Oct 25, 2022 at 6:20 PM Anthony Burdi @.***> wrote:
@Arnavkar https://github.com/Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:
- Run docker compose up -d in the great_expectations/assets/docker/mssql folder of the repository
- Run /opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;" in the docker container CLI (you can use a command or docker desktop)
- Run one of the failing tests via pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]", replace the test name in quotes after the -k to run other tests.
Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:
E sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)") E [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition E FROM [test_data_N74VdOGu]] E [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)] E (Background on this error at: https://sqlalche.me/e/14/f405)
../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1291203027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNQK77OK2TE6TSGRC6DWFBMJPANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>
Hi Anthony,
Gave your above steps a shot but it is still giving me the same error with the login timeout, even after I verified that the test_ci database had already been created on mssql within the container. Would it make sense to limit this just to mysql in that case?
Arnav
On Tue, Oct 25, 2022 at 6:59 PM Arnav Shirodkar @.***> wrote:
Hey Anthony,
Thanks for all your help with this! Let me give it a second shot tonight and Iβll update on any progress made. Keep you posted ππΌ
Regards, Arnav
On Tue, Oct 25, 2022 at 6:20 PM Anthony Burdi @.***> wrote:
@Arnavkar https://github.com/Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:
- Run docker compose up -d in the great_expectations/assets/docker/mssql folder of the repository
- Run /opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;" in the docker container CLI (you can use a command or docker desktop)
- Run one of the failing tests via pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]", replace the test name in quotes after the -k to run other tests.
Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:
E sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)") E [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition E FROM [test_data_N74VdOGu]] E [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)] E (Background on this error at: https://sqlalche.me/e/14/f405)
../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1291203027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNQK77OK2TE6TSGRC6DWFBMJPANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>
Yes I think that is a good plan. Thank you for trying! I also spent a few hours trying to get mssql to work as well and I was not successful. I will try to spend some time today or tomorrow adding in the skip mssql logic. It's a little tricky to explain (we need to only skip mssql for v2 and not v3) so I may just add it if that is OK with you.
I also hope to add a readme to the mssql docker compose folder with the command to create the test_ci database for easier set up in the future. (I tried to add this command to the docker_compose file, but I could not get it working). Submitted separately: https://github.com/great-expectations/great_expectations/pull/6211
Thatβs totally fine with me - This contribution Is also with the understanding that GE is moving away from v2 entirely so Iβm fine with it as is.
Thanks again for all your help Anthony!
Arnav
On Wed, Oct 26, 2022 at 10:01 AM Anthony Burdi @.***> wrote:
Yes I think that is a good plan. Thank you for trying! I also spent a few hours trying to get mssql to work as well and I was not successful. I will try to spend some time today or tomorrow adding in the skip mssql logic. It's a little tricky to explain (we need to only skip mssql for v2 and not v3) so I may just add it if that is OK with you. I also hope to add a readme to the mssql docker compose folder with the command to create the test_ci database for easier set up in the future. (I tried to add this command to the docker_compose file, but I could not get it working).
β Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1292099227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNRQJVWTT24UANUV46TWFE2SBANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>