great_expectations
great_expectations copied to clipboard
[FEATURE] Adds `multicolumn_sum.between` Metric, `expect_multicolumn_sum_to_be_between` Expectation
Please annotate your PR title to describe what the PR does, then give a brief bulleted description of your PR below. PR titles should begin with [BUGFIX], [FEATURE], [DOCS], or [MAINTENANCE]. If a new feature introduces breaking changes for the Great Expectations API or configuration files, please also add [BREAKING]. You can read about the tags in our contributor checklist.
Changes proposed in this pull request:
- Add a new multicolumn map metric
multicolumn_sum.between
- Add a new expectation that uses this metric,
expect_multicolumn_sum_to_be_between
After submitting your PR, CI checks will run and @ge-cla-bot will check for your CLA signature.
For a PR with nontrivial changes, we review with both design-centric and code-centric lenses.
In a design review, we aim to ensure that the PR is consistent with our relationship to the open source community, with our software architecture and abstractions, and with our users' needs and expectations. That review often starts well before a PR, for example in github issues or slack, so please link to relevant conversations in notes below to help reviewers understand and approve your PR more quickly (e.g. closes #123
).
Previous Design Review notes:
Definition of Done
Please delete options that are not relevant.
- [ ] My code follows the Great Expectations style guide
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ x ] I have added unit tests where applicable and made sure that new and existing tests are passing.
- [ ] I have run any local integration tests and made sure that nothing is broken.
Thank you for submitting!
Deploy request for niobium-lead-7998 pending review.
Visit the deploys page to approve it
Name | Link |
---|---|
Latest commit | d8b53459e83c6b5e9bf50c252cc983d99dd76b1e |
Sorry I'm sure I've done a bunch of things wrong in this PR as I am new here, but basically just logically extended the already existing expect_multicolumn_sum_to_equal
expectation to cover the case of asserting a value is between two values. I can sort out the contrib checklist in time etc. just wanted to see if I'm vaguely going in the right direction here.
For my use case of asserting that summing up floating point probabilities that should add to 1.0
, I would like to assert that the value is 1.0 +/- epsilon
Hi @qwwqwwq - thanks for opening this PR! While I haven't yet tested it, at first glance the approach does seem reasonable, and there is some precedent with other Expectation pairs (like expect_table_row_count_to_be_between
and ..._to_equal
, or expect_column_value_lengths_to_be_between
and ..._to_equal
).
I'm not sure I follow this:
For my use case of asserting that summing up floating point probabilities that should add to 1.0, I would like to assert that the value is 1.0 +/- epsilon
It also appears that you are missing a CLA, and are currently failing a linting check.
Please let me know if you have any additional questions or need additional support here.
Hi @qwwqwwq - thanks for opening this PR! While I haven't yet tested it, at first glance the approach does seem reasonable, and there is some precedent with other Expectation pairs (like
expect_table_row_count_to_be_between
and..._to_equal
, orexpect_column_value_lengths_to_be_between
and..._to_equal
).
Great, glad to hear the approach looks reasonable.
I'm not sure I follow this:
For my use case of asserting that summing up floating point probabilities that should add to 1.0, I would like to assert that the value is 1.0 +/- epsilon
Sorry I was just explaining my use case, I have a column where it's a sum of floating point numbers, so some values are 1.0 and some are 0.9999999 repeating, I just want to assert they are all equal to 1.0 +/- 0.00001. I feel like this commonly happens when you sum probabilities in a machine learning context so I think this will be useful to others.
It also appears that you are missing a CLA, and are currently failing a linting check.
Yes I was just waiting to hear back, I'll sort all that out soon.
Please let me know if you have any additional questions or need additional support here.
Thanks!
@cla-bot check
Hey @qwwqwwq ! I love the work here; it looks like we're still waiting on a signed CLA for [email protected]
. If you're able to get that resolved, we'd love to move forward with this stellar contribution!
@cla-bot check
Hey @qwwqwwq ! I love the work here; it looks like we're still waiting on a signed CLA for
[email protected]
. If you're able to get that resolved, we'd love to move forward with this stellar contribution!
Sorry I originally committed this from a repo where I had cloned it via ssh url (resulting in user.email
not being set, and falling back to the default of using my laptop's hostname, which obviously doesn't match what I signed the CLA with). I've learned the way to remedy this is to do interactive rebase and using the edit
command on all commits, which allows running git commit --amend to set the author email on the commit, which I think will satisfy the CLA bot. I've been looking at the interactive rebase menu for years and always wondered what the point of the edit
command was, today I learn..
A new contributor, HUZZAH! Welcome and thanks for joining our community. In order to accept a pull request we require that all contributors sign our Contributor License Agreement. We have two different CLAs, depending on whether you are contributing to GE in a personal or professional capacity. Please sign the one that is applicable to your situation so that we may accept your contribution:
Individual Contributor License Agreement v1.0 Software Grant and Corporate Contributor License Agreement v1.0
Once you have signed the CLA, you can add a comment with the text @cla-bot check
and the bot will update the PR status!
Please reach out to @kyleaton, on our Slack if you have any questions or if you have already signed the CLA and are receiving this message in error.
Users missing a CLA: [email protected]
Hey @qwwqwwq! Tripped the cla-bot again making a final fix on some type hints. If you'll make that edit once more we'll get this over the line!
Oh no.. I'm not sure how to fix this.. Theres no commits for me to amend as far as I can tell
(venv) (base) Jeffs-MBP:great_expectations jeffquinn$ git log | grep '[email protected]'
(venv) (base) Jeffs-MBP:great_expectations jeffquinn$
Do you care if I edit the commit history? I guess I could squash it down to one commit and amend the author on that, maybe that would make this simpler to reason about
Hi @qwwqwwq! Thank you so much for this contribution. For now we are closing this PR, but if you are interested in picking this work back up please reopen it! Looks like there is a small conflict to resolve and the CLA issue. To your last point, we don't mind the commit history being changed so long as the correct author is shown. Thank you!