deequ
deequ copied to clipboard
Features and enhancements done
New Features: 1. A Date Time Distribution analyzer for analyzing the distribution of the records based on 'DateType' or 'TimestampType' feature within fixed time intervals. files changed/created: DateTimeDistribution.scala DateTimeAggregation.scala DeequFunctions.scala ...
6 new Constraints added covering more use cases for DateTime quality checks: files changed/created: Check.scala Constraint.scala
Constraint 'isContainedIn' is now supports more Scala Numeric Types files changed/created: Check.scala
Enahancements:
-
Issue: Timestamp support #47 New State and Metric is implemented for this enhancement since previous analyzer only support Double Metric and Standard analyzer. a new abstract analyzer for timestamp analysis is implemented. files changed/created: MinimumDateTime.scala, MaximumDateTime.scala (for new analyzer implementation) Analyzer.scala
-
Analyzer for Precision and Scale of BigDecimals #46 New State and Metric is implemented. Also new analyzers that provides precision and scale of Spark's 'DecimalType'. files changed/created: Minimum.scala, Maximum.scala, Sum.scala, Mean.scala (for new analyzer implementation) Analyzer.scala
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Thanks! Could you look at the CI failures and fix them?
Hi, is this feature now available?
Thanks! Could you look at the CI failures and fix them?
sorry for the late reply. Ill work on it.
This is valuable feature you've added @Yash0215. awesome. Looking forward to using it.
Can I assist in anyway @Yash021? I'm looking to utilise this functionality asap.
please recommend if any changes are needed.
So the two enhancements implemented in this PR are essentially orthogonal, right ? If so I'd recommend we split them into separate PRs and focus on each.
Yes, that would make a lot of sense.
Hello @Yash0215 @sscdotopen, were there furthur developments on this work? If not, I can volunteer to take it forward.
FWIW, I forked this project and am actively developing it afterwards. See this changelog to see the enhancements/bugfixes.
I'm also thinking of announcing it as an active fork of this project somewhere (maybe as an issue for this repo) in the near future.
Hi, thanks so much for introducing all these changes. Unfortunately, we currently don't have availability to give this a proper review. Will keep this PR in the backlog for now. If you have the opportunity to submit a couple of smaller reviews that would be great. It's hard to find the time to do big reviews and a few smaller PRs could help us understand the main ideas and make progress on this.
@Yash0215 Please get back to us on this if you get the chance. We are considering closing this PR soon.
@twollnik @Yash0215 - Will this PR be merged to improve deequ to handle timestamp/date support?
Any update on this? to improve deequ for handling of timestamp/date support?
This PR is quite big with multiple unrelated changes making review hard. Chunk this into multiple smaller PR should be a good start. I am interested in timestamp/date support