great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Subject: Support to include ID/PK in validation result for each row t…

Open abekfenn opened this issue 2 years ago • 5 comments

…hat failed an expectations

Ticket Number: #3195

Problem:

  • No arguments available to configure column subset of unexpected_rows

Solution:

  • Add args to subset unexpected_rows

Note:

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

After submitting your PR, CI checks will run and @cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • [ ] My code follows the Great Expectations style guide
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have added unit tests where applicable and made sure that new and existing tests are passing.
  • [ ] I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

abekfenn avatar Aug 30 '22 05:08 abekfenn

Deploy Preview for niobium-lead-7998 ready!

Name Link
Latest commit 2ff38b95f30114471d274d44463fc99ac51558bb
Latest deploy log https://app.netlify.com/sites/niobium-lead-7998/deploys/6372cb297531e70008ddfb65
Deploy Preview https://deploy-preview-5876--niobium-lead-7998.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

netlify[bot] avatar Aug 30 '22 05:08 netlify[bot]

Looks like we'll need a linting pass as well!

austiezr avatar Aug 31 '22 16:08 austiezr

Thanks @austiezr your feedback was very helpful. I'm definitely much closer now and was looking to add a test for an expectation that leverages the SqlAlchemyExecutionEngine but I don't see any existing tests or datasets that I could reference. I assume they're out there, would you be able to point me to an example test or dataset that I could use to test an expectation with these args?

abekfenn avatar Sep 01 '22 02:09 abekfenn

Hey @abekfenn ! Have you had an opportunity to check out these tests?

austiezr avatar Sep 01 '22 17:09 austiezr

@austiezr I haven't seen those tests but I was looking for tests that test metric providers and the results that different configurations and providers produce. Such as those in tests/expectations/metrics/test_map_metric.py. I'll see if I can put something together from both test files, but surprised there are no tests that test the results produced by the metric providers for SQLAlchemy datasets.

abekfenn avatar Sep 01 '22 18:09 abekfenn

Thank you very very much @abekfenn for getting the ball rolling, and all the work you put in. I was able to start where you left off, and have a PR for the Pandas implementation that is currently under review by the team. I'll post more updates here :)

Shinnnyshinshin avatar Nov 10 '22 03:11 Shinnnyshinshin

Thanks a lot for wrapping up the changes on these @Shinnnyshinshin!

I only see changes related to pandas but my understanding was that this feature (stemming from https://github.com/great-expectations/great_expectations/issues/3195#) is really needed in the SQL engine in order to support identification of the Pk/Id of a failed row. This functionality already exists for pandas in GE but is not supported for SQL. Just want to make sure that merging doesn't mean that this issue will be closed out and that we will continue work on this issue for the SQL engine implementation?

abekfenn avatar Nov 14 '22 23:11 abekfenn

Hi @abekfenn thank you for the follow-up message 😄 Yes the next steps for us are to do the implementation for SQL and Spark (we'll be sure to do SQL first). Wanting to thank you again for helping us get this far, and Ill be sure to update you once the SQL implementation has been implemented and merged.

Shinnnyshinshin avatar Nov 14 '22 23:11 Shinnnyshinshin

Awesome! Thanks so much @Shinnnyshinshin

abekfenn avatar Nov 14 '22 23:11 abekfenn