Sebastian Schelter
Sebastian Schelter
The workaround is fine, no need to address this urgently
No urgent need to fix this, found a way to make it work (and apply mlinspect to an image classification pipeline with CNNs)
You can use a check based on a regular expression: https://github.com/awslabs/deequ/blob/master/src/main/scala/com/amazon/deequ/checks/Check.scala#L689
At the moment, deequ does not support any metrics calculations on timestamp/date columns. The task here would be to integrate those. A problem ist that most of our analyzers produce...
Lets try to make it support timestamps in addition to what it supports now. In general, we only operate on Spark's supported column types.
We would be very happy to receive such a PR!
Deequ only allows you to run anomaly detection algorithms on the metrics computed from the data, not on the data itself.
Hi Valentin, Great to see you work on Deequ! We have something like this already in the profiler, but it does not use the aggregator API, maybe you want to...
I don't think we should run all aggregations via the aggregator API, because there are some aggregations which might be run on high-cardinality columns (e.g. testing whether a key column...
You can use the row level schema validator for this: https://github.com/awslabs/deequ/blob/db63229e83bf60da0f7cff323f081b2490578b38/src/main/scala/com/amazon/deequ/schema/RowLevelSchemaValidator.scala