data-validation
data-validation copied to clipboard
Library for exploring and validating machine learning data
## Context When running `tfdv.generate_statistics_from_tfrecord` on Dataflow, the job gets submitted successfully to the cluster but I get a: `ImportError: No module named tensorflow_data_validation.statistics.stats_impl` during the job unpickling phase in...
The domain of a categorical int feature is included in my schema as a `string_domain` (generated by using `feature.int_domain.is_categorical = True`). However, when I try to run `tfdv.validate_instance()` on an...
I am using TFDV 1.2.0 and have a problem where I am consistently getting workers OOMing on Dataflow even with very large instance types (e.g. `n2-highmem-16` and `n2-highmem-32`). I've tried...
Tensorflow data-validation support tensorflow api 2, but setup.py missing information of 2.3.x version
When I generate statistics from a `.tfrecord` file with `generate_statistics_from_tfrecord`, its histograms contain weird float values as the `sample_count`s of the buckets. For example, in one bucket which is supposed...
Hi! Is there a way to don't block the execution when calling the `generate_statistics_from_csv`? Maybe return an [Operation](https://github.com/googleapis/python-api-core/blob/master/google/api_core/operation.py#L50).
g3doc update : feature_whitelist has been depreciated and replaced with feature_allowlist in StatsOption
/type feature Hi, since TF records are already converted to Pyarrow Tables to compute statistics, how hard would it be to add an option to read directly Pyarrow file or...