deequ icon indicating copy to clipboard operation
deequ copied to clipboard

`ConstraintSuggestionRunner` failing with `Empty state for analyzer Completeness, all input values were NULL`

Open joemcmahon opened this issue 2 years ago • 0 comments

Deequ version: 2.0.0-spark-3.1

I'm using ConstraintSuggestionsRunner to generate a first-cut set of constraints for data, and have hit this issue.

When using the following function to generate suggested checks on a DataFrame that has a nullable field, total_clicks, containing all nulls:

    def defaultChecks(data: DataFrame): ConstraintSuggestionResult = {
     /* Have deequ suggest checks */
     ConstraintSuggestionRunner()
       .onData(data)
       .addConstraintRules(Rules.DEFAULT)
       .run()
   }

I get this exception:

uncaught exception: com.amazon.deequ.analyzers.runners.EmptyStateException: Empty state for analyzer Completeness(total_clicks,None), all input values were NULL.
	at com.amazon.deequ.analyzers.Analyzers$.emptyStateException(Analyzer.scala:482)
	at com.amazon.deequ.analyzers.Analyzers$.metricFromEmpty(Analyzer.scala:491)
	at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.computeMetricFrom(Analyzer.scala:211)
	at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.computeMetricFrom(Analyzer.scala:200)
	at com.amazon.deequ.analyzers.Analyzer.calculateMetric(Analyzer.scala:127)
	at com.amazon.deequ.analyzers.Analyzer.calculateMetric$(Analyzer.scala:107)
	at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.calculateMetric(Analyzer.scala:200)
	at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:194)
	at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult$(Analyzer.scala:185)
	at com.amazon.deequ.analyzers.StandardScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:200)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.successOrFailureMetricFrom(AnalysisRunner.scala:362)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$5(AnalysisRunner.scala:330)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
	at scala.collection.immutable.List.foreach(List.scala:392)
	at scala.collection.TraversableLike.map(TraversableLike.scala:238)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
	at scala.collection.immutable.List.map(List.scala:298)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:328)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
	at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
	at com.amazon.deequ.analyzers.runners.AnalysisRunBuilder.run(AnalysisRunBuilder.scala:110)
	at com.amazon.deequ.profiles.ColumnProfiler$.profile(ColumnProfiler.scala:141)
	at com.amazon.deequ.profiles.ColumnProfilerRunner.run(ColumnProfilerRunner.scala:72)
	at com.amazon.deequ.profiles.ColumnProfilerRunBuilder.run(ColumnProfilerRunBuilder.scala:185)
	at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.profileAndSuggest(ConstraintSuggestionRunner.scala:203)
	at com.amazon.deequ.suggestions.ConstraintSuggestionRunner.run(ConstraintSuggestionRunner.scala:102)
	at com.amazon.deequ.suggestions.ConstraintSuggestionRunBuilder.run(ConstraintSuggestionRunBuilder.scala:226)
       ...

Elided remainder of traceback is from the sample code supplied above and code calling it.

I would expect this to either warn me that the column was null and ignore it, or, better, generate a constraint of hasCompleteness(0.0) for the all-null column.

I have a workaround of dropping the column prior to running it through ConstraintSuggestionRunner, but I'd prefer that ConstraintSuggestionRunner be made more resilient.

joemcmahon avatar Mar 30 '22 20:03 joemcmahon