deequ icon indicating copy to clipboard operation
deequ copied to clipboard

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Results 166 deequ issues
Sort by recently updated
recently updated
newest added

Is there a simple quick start example for the usage of deequ in Spark SQL? If anyone has some suggestions or examples, that would be greatly helpful. Alternatively, it could...

I am trying to update my Databricks runtime to the newest version (DBR 11.0). However, the deequ package is not being installed properly. On the older Databricks runtimes the package...

### Mean is calculated incorrectly when the value for the column is really high (Example: EpochTimestamp) and the size of the dataset is high as well (Dataset Size). **Based on...

bug

*Issue #, if available:* - Currently .hasPattern always fails for null values *Description of changes:* - I checked #342 , so I added isNullAllowed variable, and If isNullAllowed is true,...

*Issue #, if available:* https://github.com/awslabs/deequ/issues/380 *Description of changes:* Support for Scala 2.13 and Spark 3.2 It is not fully done as I face two issues before I could even successfully...

when using ColumnProfilerRunner function, how do i solve it? also I tried to work with this it on my mac - also same error glue version 2.0 spark 2.4 python...

New Features: 1. A Date Time Distribution analyzer for analyzing the distribution of the records based on 'DateType' or 'TimestampType' feature within fixed time intervals. files changed/created: DateTimeDistribution.scala DateTimeAggregation.scala DeequFunctions.scala...

enhancement
help wanted

I have around 50 columns in my table when I try to run entropy analyzer it is creating multiple jobs and they are not executing in parallel while completeness is...

Spark Version: 3.2.1 Scala Version : 2.13.8 This is what the error looks like : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession$ at deequ5$.main(Main.scala:11) at deequ5.main(Main.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession$ And...

*Description of changes:* This PR lifts the private restrictions on the SERDE classes. It also reorganizes the code to a single object/class per file for easier code navigation and discovery....