great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

[FEATURE] Adds `multicolumn_sum.between` Metric, `expect_multicolumn_sum_to_be_between` Expectation

Open qwwqwwq opened this issue 2 years ago • 12 comments

Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.

Changes proposed in this pull request:

  • Add a new multicolumn map metric multicolumn_sum.between
  • Add a new expectation that uses this metric, expect_multicolumn_sum_to_be_between

After submitting your PR, CI checks will run and @ge-cla-bot will check for your CLA signature.

For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.

In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123).

Previous Design Review notes:

Definition of Done

Please delete options that are not relevant.

  • [ ] My code follows the Great Expectations style guide
  • [ ] I have performed a self-review of my own code
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ x ] I have added unit tests where applicable and made sure that new and existing tests are passing.
  • [ ] I have run any local integration tests and made sure that nothing is broken.

Thank you for submitting!

qwwqwwq avatar Mar 17 '22 23:03 qwwqwwq

Deploy request for niobium-lead-7998 pending review.

Visit the deploys page to approve it

Name Link
Latest commit d8b53459e83c6b5e9bf50c252cc983d99dd76b1e

netlify[bot] avatar Mar 17 '22 23:03 netlify[bot]

Sorry I'm sure I've done a bunch of things wrong in this PR as I am new here, but basically just logically extended the already existing expect_multicolumn_sum_to_equal expectation to cover the case of asserting a value is between two values. I can sort out the contrib checklist in time etc. just wanted to see if I'm vaguely going in the right direction here.

For my use case of asserting that summing up floating point probabilities that should add to 1.0, I would like to assert that the value is 1.0 +/- epsilon

qwwqwwq avatar Mar 17 '22 23:03 qwwqwwq

Hi @qwwqwwq - thanks for opening this PR! While I haven't yet tested it, at first glance the approach does seem reasonable, and there is some precedent with other Expectation pairs (like expect_table_row_count_to_be_between and ..._to_equal, or expect_column_value_lengths_to_be_between and ..._to_equal).

I'm not sure I follow this:

For my use case of asserting that summing up floating point probabilities that should add to 1.0, I would like to assert that the value is 1.0 +/- epsilon

It also appears that you are missing a CLA, and are currently failing a linting check.

Please let me know if you have any additional questions or need additional support here.

talagluck avatar Apr 06 '22 23:04 talagluck

Hi @qwwqwwq - thanks for opening this PR! While I haven't yet tested it, at first glance the approach does seem reasonable, and there is some precedent with other Expectation pairs (like expect_table_row_count_to_be_between and ..._to_equal, or expect_column_value_lengths_to_be_between and ..._to_equal).

Great, glad to hear the approach looks reasonable.

I'm not sure I follow this:

For my use case of asserting that summing up floating point probabilities that should add to 1.0, I would like to assert that the value is 1.0 +/- epsilon

Sorry I was just explaining my use case, I have a column where it's a sum of floating point numbers, so some values are 1.0 and some are 0.9999999 repeating, I just want to assert they are all equal to 1.0 +/- 0.00001. I feel like this commonly happens when you sum probabilities in a machine learning context so I think this will be useful to others.

It also appears that you are missing a CLA, and are currently failing a linting check.

Yes I was just waiting to hear back, I'll sort all that out soon.

Please let me know if you have any additional questions or need additional support here.

Thanks!

qwwqwwq avatar Apr 07 '22 15:04 qwwqwwq

@cla-bot check

austiezr avatar Aug 02 '22 16:08 austiezr

Hey @qwwqwwq ! I love the work here; it looks like we're still waiting on a signed CLA for [email protected]. If you're able to get that resolved, we'd love to move forward with this stellar contribution!

austiezr avatar Aug 08 '22 19:08 austiezr

@cla-bot check

qwwqwwq avatar Aug 09 '22 01:08 qwwqwwq

Hey @qwwqwwq ! I love the work here; it looks like we're still waiting on a signed CLA for [email protected]. If you're able to get that resolved, we'd love to move forward with this stellar contribution!

Sorry I originally committed this from a repo where I had cloned it via ssh url (resulting in user.email not being set, and falling back to the default of using my laptop's hostname, which obviously doesn't match what I signed the CLA with). I've learned the way to remedy this is to do interactive rebase and using the edit command on all commits, which allows running git commit --amend to set the author email on the commit, which I think will satisfy the CLA bot. I've been looking at the interactive rebase menu for years and always wondered what the point of the edit command was, today I learn..

qwwqwwq avatar Aug 09 '22 01:08 qwwqwwq

A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GE in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution:

Individual Contributor License Agreement v1.0 Software Grant and Corporate Contributor License Agreement v1.0

Once you have signed the CLA, you can add a comment with the text @cla-bot check and the bot will update the PR status!

Please reach out to @kyleaton, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error.

Users missing a CLA: [email protected]

gx-cla-bot[bot] avatar Aug 09 '22 21:08 gx-cla-bot[bot]

Hey @qwwqwwq! Tripped the cla-bot again making a final fix on some type hints. If you'll make that edit once more we'll get this over the line!

austiezr avatar Aug 09 '22 21:08 austiezr

Oh no.. I'm not sure how to fix this.. Theres no commits for me to amend as far as I can tell

(venv) (base) Jeffs-MBP:great_expectations jeffquinn$ git log | grep '[email protected]'
(venv) (base) Jeffs-MBP:great_expectations jeffquinn$

qwwqwwq avatar Aug 10 '22 13:08 qwwqwwq

Do you care if I edit the commit history? I guess I could squash it down to one commit and amend the author on that, maybe that would make this simpler to reason about

qwwqwwq avatar Aug 10 '22 13:08 qwwqwwq

Hi @qwwqwwq! Thank you so much for this contribution. For now we are closing this PR, but if you are interested in picking this work back up please reopen it! Looks like there is a small conflict to resolve and the CLA issue. To your last point, we don't mind the commit history being changed so long as the correct author is shown. Thank you!

anthonyburdi avatar Mar 22 '23 22:03 anthonyburdi