deequ icon indicating copy to clipboard operation
deequ copied to clipboard

Features and enhancements done

Open Yash0215 opened this issue 4 years ago • 15 comments

New Features: 1. A Date Time Distribution analyzer for analyzing the distribution of the records based on 'DateType' or 'TimestampType' feature within fixed time intervals. files changed/created: DateTimeDistribution.scala DateTimeAggregation.scala DeequFunctions.scala ...

6 new Constraints added covering more use cases for DateTime quality checks: files changed/created: Check.scala Constraint.scala

Constraint 'isContainedIn' is now supports more Scala Numeric Types files changed/created: Check.scala

Enahancements:

  1. Issue: Timestamp support #47 New State and Metric is implemented for this enhancement since previous analyzer only support Double Metric and Standard analyzer. a new abstract analyzer for timestamp analysis is implemented. files changed/created: MinimumDateTime.scala, MaximumDateTime.scala (for new analyzer implementation) Analyzer.scala

  2. Analyzer for Precision and Scale of BigDecimals #46 New State and Metric is implemented. Also new analyzers that provides precision and scale of Spark's 'DecimalType'. files changed/created: Minimum.scala, Maximum.scala, Sum.scala, Mean.scala (for new analyzer implementation) Analyzer.scala

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Yash0215 avatar Sep 29 '20 15:09 Yash0215

Thanks! Could you look at the CI failures and fix them?

sscdotopen avatar Oct 07 '20 08:10 sscdotopen

Hi, is this feature now available?

lucene avatar Oct 15 '20 09:10 lucene

Thanks! Could you look at the CI failures and fix them?

sorry for the late reply. Ill work on it.

Yash0215 avatar Oct 15 '20 09:10 Yash0215

This is valuable feature you've added @Yash0215. awesome. Looking forward to using it.

lucene avatar Oct 15 '20 11:10 lucene

Can I assist in anyway @Yash021? I'm looking to utilise this functionality asap.

lucene avatar Oct 15 '20 13:10 lucene

please recommend if any changes are needed.

Yash0215 avatar Oct 26 '20 06:10 Yash0215

So the two enhancements implemented in this PR are essentially orthogonal, right ? If so I'd recommend we split them into separate PRs and focus on each.

aviatesk avatar Oct 30 '20 08:10 aviatesk

Yes, that would make a lot of sense.

sscdotopen avatar Oct 30 '20 09:10 sscdotopen

Hello @Yash0215 @sscdotopen, were there furthur developments on this work? If not, I can volunteer to take it forward.

rounakdatta avatar Feb 26 '21 17:02 rounakdatta

FWIW, I forked this project and am actively developing it afterwards. See this changelog to see the enhancements/bugfixes.

I'm also thinking of announcing it as an active fork of this project somewhere (maybe as an issue for this repo) in the near future.

aviatesk avatar Feb 26 '21 17:02 aviatesk

Hi, thanks so much for introducing all these changes. Unfortunately, we currently don't have availability to give this a proper review. Will keep this PR in the backlog for now. If you have the opportunity to submit a couple of smaller reviews that would be great. It's hard to find the time to do big reviews and a few smaller PRs could help us understand the main ideas and make progress on this.

twollnik avatar Jul 20 '21 14:07 twollnik

@Yash0215 Please get back to us on this if you get the chance. We are considering closing this PR soon.

twollnik avatar Oct 19 '21 07:10 twollnik

@twollnik @Yash0215 - Will this PR be merged to improve deequ to handle timestamp/date support?

RunnX avatar Jun 14 '22 01:06 RunnX

Any update on this? to improve deequ for handling of timestamp/date support?

suadhika avatar Feb 20 '23 11:02 suadhika

This PR is quite big with multiple unrelated changes making review hard. Chunk this into multiple smaller PR should be a good start. I am interested in timestamp/date support

zeotuan avatar Apr 18 '24 11:04 zeotuan