deequ icon indicating copy to clipboard operation
deequ copied to clipboard

Removal of a metric from a metric repository

Open JonathonShields opened this issue 3 years ago • 1 comments

Is it possible to remove a metric from a metric repository? I have had a look at the APIs and see no obvious way of doing this. Even if the ability exists to update a metric, i.e. I could add a status tag to its key and modified it to 'cancelled', but again I see no means to do that.

The use case is that I receive data in batches, and each is assessed re its data quality as it comes in, the metrics for each batch is saved during its ingestion. At some time in the future it has been decided that a particular batch of data needs to be removed from the system, this removal should also remove any metrics related to that batch in the metric repository.

Any recommendations on the best way to achieve this given the current APIs?

Many thanks.

JonathonShields avatar Apr 27 '21 13:04 JonathonShields

Chiming in on this. Doesn't seem to be an obvious way to manage recomputing of data. E.g. I might process and store metrics for a year's data...then there's some problem with the underlying data requiring reprocessing and, thus, recomputing of metrics.

There doesn't seem to be a clear way to remove/replace/etc a subset of the old metrics with new for this data.

It also raises the question on why json was chosen as the default storage medium over a partitioned table format. This would allow something like overwriting data for a partition if it's recomputed.

jameskyle avatar Jul 16 '21 18:07 jameskyle