great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

[FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set'

Open Arnavkar opened this issue 2 years ago β€’ 2 comments

[FEATURE]/ Added pairwise expectation 'expect_column_pair_values_to_be_in_set' sqlalchemy dataset.py file

Changes proposed in this pull request:

  • Allows users to make use of the pairwise expectation 'expect_column_pair_values_to_be_in_set' for SQLAlchemy backends in v2 GE -Provides the 'column_pair_map_expectation 'decorator for other pairwise expectations to be developed for SQLAlchemy backends in v2 GE

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

Definition of Done

Please delete options that are not relevant.

  • [x] My code follows the Great Expectations style guide
  • [x] I have performed a self-review of my own code
  • [x] I have commented my code, particularly in hard-to-understand areas
  • [x] I have made corresponding changes to the documentation
  • [ ] I have added unit tests where applicable and made sure that new and existing tests are passing.
  • [ ] I have run any local integration tests and made sure that nothing is broken.

Arnavkar avatar Sep 27 '22 04:09 Arnavkar

Deploy request for niobium-lead-7998 pending review.

Visit the deploys page to approve it

Name Link
Latest commit 348bba3f328199a83a51737d2d8d1eb6ec91a416

netlify[bot] avatar Sep 27 '22 04:09 netlify[bot]

Hi @Arnavkar! Thank you for contributing to Great Expectations! I'll be taking a look at this contribution. It looks like the linter is picking up some issues, would you mind running invoke hooks from the repo root & merging develop into your branch? That will run all of our automated linting.

anthonyburdi avatar Oct 19 '22 21:10 anthonyburdi

Hi @Arnavkar! I was able to make commits to your branch for satisfying the linter and merging develop. When you have a moment would you be able to sign the CLA? I'll submit my code review soon as well.

anthonyburdi avatar Oct 20 '22 15:10 anthonyburdi

Hi Anthony,

Sorry for the slow response β€” Thank you for reaching out and making those fixes! I believe I've already submitted the CLA already, that's via the google form correct? Let me know if I should go ahead and submit it once more.

Thank you!

Regards, Arnav

On Thu, Oct 20, 2022 at 11:17 AM Anthony Burdi @.***> wrote:

Hi @Arnavkar https://github.com/Arnavkar! I was able to make commits to your branch for satisfying the linter and merging develop. When you have a moment would you be able to sign the CLA? I'll submit my code review soon as well.

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1285723531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNRHB6YRI2JFMSIHJ2TWEFPBZANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 20 '22 19:10 Arnavkar

@cla-bot check

anthonyburdi avatar Oct 21 '22 15:10 anthonyburdi

Hi @Arnavkar! If you don't mind can you also sign with [email protected]? I am just seeing your .edu email address. If that doesn't work we can override.

Right now I'm enabling tests for the new expectation and verifying.

anthonyburdi avatar Oct 21 '22 16:10 anthonyburdi

Hi Anthony,

Thanks a ton! I’ll go ahead and submit the form once more with my Morningstar email address, let me know if / when you see it on your end πŸ‘πŸΌ

Regards, Arnav

On Fri, Oct 21, 2022 at 12:21 PM Anthony Burdi @.***> wrote:

@.**** approved this pull request.

Thank you for this contribution! I made one small edit to enable the tests for the new expectation and ran the linter, but otherwise LGTM!

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#pullrequestreview-1151233201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNS2LSG5VB7Z2GID443WEK7H3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.*** com>

Arnavkar avatar Oct 21 '22 16:10 Arnavkar

Hi @Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?

anthonyburdi avatar Oct 24 '22 15:10 anthonyburdi

Sure, I’ll take a look later today πŸ‘πŸΌ

Regards, Arnav

On Mon, Oct 24, 2022 at 11:29 AM Anthony Burdi @.***> wrote:

Hi @Arnavkar https://github.com/Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1289212019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNUGNF77WTSWD5JKPBDWE2TL3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 24 '22 16:10 Arnavkar

Hi Anthony,

Would you be able to share the command that you're running/ which of the test cases are failing? All the tests on my end seem to be passing successfully. Thank you!

Regards, Arnav

On Mon, Oct 24, 2022 at 12:18 PM Arnav Shirodkar @.***> wrote:

Sure, I’ll take a look later today πŸ‘πŸΌ

Regards, Arnav

On Mon, Oct 24, 2022 at 11:29 AM Anthony Burdi @.***> wrote:

Hi @Arnavkar https://github.com/Arnavkar, thank you for signing the CLA, it looks like all went well there. I am noticing that after I enabled testing, the test for mssql is failing, is that something you would be able to look into?

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1289212019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNUGNF77WTSWD5JKPBDWE2TL3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 24 '22 19:10 Arnavkar

Hi @Arnavkar Are you able to see this link of our CI pipeline? I'm looking at the details of the automated checks at the bottom of this PR. Specifically the failing dev (db_integration mssql) stage. No worries if you are not able to view that link. It looks like the following tests are failing for mssql:

FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:positive_test_with_nulls_and_ignore_row_if_either_value_is_missing]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:negative_test_with_nulls]
FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:another_positive_test_with_nulls]

The test command that is being run in that CI pipeline is pytest tests --mssql. There are a few other directives to ignore other tests - you can use --ignore 'tests/cli' --ignore 'contrib/cli/tests' --ignore 'tests/integration/usage_statistics' to do the same. You can also just run the failing tests. To run these tests will require setting up an mssql instance, e.g. running docker compose up -d in the great_expectations/assets/docker/mssql folder.

anthonyburdi avatar Oct 25 '22 14:10 anthonyburdi

Hi Anthony,

I'm trying to run the mssql tests on my end, but I'm constantly unable to connect to the mssql db with the following error message:

pyodbc.OperationalError: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 17 for SQL Server]Login timeout expired (0) (SQLDriverConnect)')

Any thoughts?

Arnav

On Tue, Oct 25, 2022 at 10:16 AM Anthony Burdi @.***> wrote:

Hi @Arnavkar https://github.com/Arnavkar Are you able to see this link of our CI pipeline https://dev.azure.com/great-expectations/great_expectations/_build/results?buildId=113376&view=logs&j=5c55b4fa-9ba6-56aa-e1bf-dbaa4c84fec3&t=84b228f2-b0aa-5893-743e-5e39f9043c2d? I'm looking at the details of the automated checks at the bottom of this PR. Specifically the failing dev (db_integration mssql) stage. No worries if you are not able to view that link. It looks like the following tests are failing for mssql:

FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls] FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:positive_test_with_nulls_and_ignore_row_if_either_value_is_missing] FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:negative_test_with_nulls] FAILED tests/test_definitions/test_expectations.py::test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:another_positive_test_with_nulls]

The test command that is being run in that CI pipeline is pytest tests --mssql. There are a few other directives to ignore other tests - you can use --ignore 'tests/cli' --ignore 'contrib/cli/tests' --ignore 'tests/integration/usage_statistics' to do the same. You can also just run the failing tests. To run these tests will require setting up an mssql instance, e.g. running docker compose up -d in the great_expectations/assets/docker/mssql folder.

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1290630935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNW6HZYJ2SAABPRW2STWE7TU3ANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 25 '22 15:10 Arnavkar

@Arnavkar I'm having trouble as well, but my error leads me to think the database is not being created when running the docker compose script. We create the database within our CI pipeline so let me see if I can get it created when creating the docker container so we can debug the failing tests locally. In the worst case, we can skip support for mssql for this expectation if it is too involved and tackle that in a subsequent PR.

anthonyburdi avatar Oct 25 '22 21:10 anthonyburdi

@Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:

  1. Run docker compose up -d in the great_expectations/assets/docker/mssql folder of the repository
  2. Run /opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;" in the docker container CLI (you can use a command or docker desktop)
  3. Run one of the failing tests via pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]", replace the test name in quotes after the -k to run other tests.

Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:

E       sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)")
E       [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition
E       FROM [test_data_N74VdOGu]]
E       [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)]
E       (Background on this error at: https://sqlalche.me/e/14/f405)

../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError

anthonyburdi avatar Oct 25 '22 22:10 anthonyburdi

Hey Anthony,

Thanks for all your help with this! Let me give it a second shot tonight and I’ll update on any progress made. Keep you posted πŸ‘πŸΌ

Regards, Arnav

On Tue, Oct 25, 2022 at 6:20 PM Anthony Burdi @.***> wrote:

@Arnavkar https://github.com/Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:

  1. Run docker compose up -d in the great_expectations/assets/docker/mssql folder of the repository
  2. Run /opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;" in the docker container CLI (you can use a command or docker desktop)
  3. Run one of the failing tests via pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]", replace the test name in quotes after the -k to run other tests.

Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:

E sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)") E [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition E FROM [test_data_N74VdOGu]] E [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)] E (Background on this error at: https://sqlalche.me/e/14/f405)

../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1291203027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNQK77OK2TE6TSGRC6DWFBMJPANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 25 '22 22:10 Arnavkar

Hi Anthony,

Gave your above steps a shot but it is still giving me the same error with the login timeout, even after I verified that the test_ci database had already been created on mssql within the container. Would it make sense to limit this just to mysql in that case?

Arnav

On Tue, Oct 25, 2022 at 6:59 PM Arnav Shirodkar @.***> wrote:

Hey Anthony,

Thanks for all your help with this! Let me give it a second shot tonight and I’ll update on any progress made. Keep you posted πŸ‘πŸΌ

Regards, Arnav

On Tue, Oct 25, 2022 at 6:20 PM Anthony Burdi @.***> wrote:

@Arnavkar https://github.com/Arnavkar I was able to run the tests locally using the docker compose file to spin up a mssql database. I did have to manually create the test_ci database expected by our test suite, I tried a few things to add this to the docker compose without success. The steps I took were as follows:

  1. Run docker compose up -d in the great_expectations/assets/docker/mssql folder of the repository
  2. Run /opt/mssql-tools/bin/sqlcmd -U sa -P "ReallyStrongPwd1234%^&*" -Q "CREATE DATABASE test_ci;" in the docker container CLI (you can use a command or docker desktop)
  3. Run one of the failing tests via pytest --mssql tests/test_definitions/test_expectations.py -k "test_case_runner[mssql/column_pair_map_expectations/expect_column_pair_values_to_be_in_set:basic_positive_test_without_nulls]", replace the test name in quotes after the -k to run other tests.

Are you able to try this to see if it works for you? I am seeing the same failures as in the CI pipeline, namely the following for the test referenced in the command above:

E sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'IS'. (156) (SQLExecDirectW)") E [SQL: INSERT INTO [#ge_temp_80395bba] (condition) SELECT CASE WHEN (NOT (x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL OR x = ? AND (x = ?) IS NOT NULL AND x = ? AND (x = ?) IS NOT NULL) AND NOT (x IS NULL AND x IS NULL)) THEN ? ELSE ? END AS condition E FROM [test_data_N74VdOGu]] E [parameters: (1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 9, 9, 9, 9, 10, 10, 10, 10, 1, 0)] E (Background on this error at: https://sqlalche.me/e/14/f405)

../../../../miniconda3/envs/pr_6097/lib/python3.8/site-packages/sqlalchemy/engine/default.py:719: ProgrammingError

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1291203027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNQK77OK2TE6TSGRC6DWFBMJPANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 26 '22 01:10 Arnavkar

Yes I think that is a good plan. Thank you for trying! I also spent a few hours trying to get mssql to work as well and I was not successful. I will try to spend some time today or tomorrow adding in the skip mssql logic. It's a little tricky to explain (we need to only skip mssql for v2 and not v3) so I may just add it if that is OK with you.

I also hope to add a readme to the mssql docker compose folder with the command to create the test_ci database for easier set up in the future. (I tried to add this command to the docker_compose file, but I could not get it working). Submitted separately: https://github.com/great-expectations/great_expectations/pull/6211

anthonyburdi avatar Oct 26 '22 14:10 anthonyburdi

That’s totally fine with me - This contribution Is also with the understanding that GE is moving away from v2 entirely so I’m fine with it as is.

Thanks again for all your help Anthony!

Arnav

On Wed, Oct 26, 2022 at 10:01 AM Anthony Burdi @.***> wrote:

Yes I think that is a good plan. Thank you for trying! I also spent a few hours trying to get mssql to work as well and I was not successful. I will try to spend some time today or tomorrow adding in the skip mssql logic. It's a little tricky to explain (we need to only skip mssql for v2 and not v3) so I may just add it if that is OK with you. I also hope to add a readme to the mssql docker compose folder with the command to create the test_ci database for easier set up in the future. (I tried to add this command to the docker_compose file, but I could not get it working).

β€” Reply to this email directly, view it on GitHub https://github.com/great-expectations/great_expectations/pull/6097#issuecomment-1292099227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AP6UTNRQJVWTT24UANUV46TWFE2SBANCNFSM6AAAAAAQWMQNRM . You are receiving this because you were mentioned.Message ID: @.***>

Arnavkar avatar Oct 26 '22 16:10 Arnavkar