deequ
deequ copied to clipboard
`ConstraintSuggestionRunner` failing with `Empty state for analyzer Completeness, all input values were NULL`
Deequ version: 2.0.0-spark-3.1
I'm using ConstraintSuggestionsRunner
to generate a first-cut set of constraints for data, and have hit this issue.
When using the following function to generate suggested checks on a DataFrame
that has a nullable
field, total_clicks
, containing all nulls:
def defaultChecks(data: DataFrame): ConstraintSuggestionResult = {
/* Have deequ suggest checks */
ConstraintSuggestionRunner()
.onData(data)
.addConstraintRules(Rules.DEFAULT)
.run()
}
I get this exception:
uncaught exception: com.amazon.deequ.analyzers.runners.EmptyStateException: Empty state for analyzer Completeness(total_clicks,None), all input values were NULL.
at com.amazon.deequ.analyzers.Analyzers$.emptyStateException(Analyzer.scala:482)
at com.amazon.deequ.analyzers.Analyzers$.metricFromEmpty(Analyzer.scala:491)
at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.computeMetricFrom(Analyzer.scala:211)
at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.computeMetricFrom(Analyzer.scala:200)
at com.amazon.deequ.analyzers.Analyzer.calculateMetric(Analyzer.scala:127)
at com.amazon.deequ.analyzers.Analyzer.calculateMetric$(Analyzer.scala:107)
at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.calculateMetric(Analyzer.scala:200)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:194)
at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult$(Analyzer.scala:185)
at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:200)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.successOrFailureMetricFrom(AnalysisRunner.scala:362)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$5(AnalysisRunner.scala:330)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:328)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110)
at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141)
at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72)
at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185)
at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203)
at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102)
at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226)
...
Elided remainder of traceback is from the sample code supplied above and code calling it.
I would expect this to either warn me that the column was null and ignore it, or, better, generate a constraint of hasCompleteness(0.0)
for the all-null column.
I have a workaround of dropping the column prior to running it through ConstraintSuggestionRunner
, but I'd prefer that ConstraintSuggestionRunner
be made more resilient.