data-validation icon indicating copy to clipboard operation
data-validation copied to clipboard

Library for exploring and validating machine learning data

Results 55 data-validation issues
Sort by recently updated
recently updated
newest added

In Tensorflow Data Validation, there is a method slicing_util.get_feature_value_slicer() to slice data based on a feature value. Is it possible to slice the data based on a date range using...

stat:awaiting tensorflower
type:support

I'm trying to run tfdv process in Kubeflow Pipeline and visualize the results in the pipeline UI. For statistics, I can easily visualize using `get_statistics_html`. However, for schema and anomalies,...

stat:awaiting tensorflower
type:support

In the [TFDV Get Started](https://www.tensorflow.org/tfx/data_validation/get_started#inferring_a_schema_over_the_data) page, it states that: > TFDV also provides the `validate_instance` function for identifying whether an individual example exhibits anomalies when matched against a schema. To...

type:docs
stat:awaiting tensorflower
type:bug

I opened issue #101 about dealing with numerical features due to the need for ML data quality control in my company. I have made small workaround suitable to our pipeline,...

cla: yes

I think it would be nice to have a top-level function to check for anomalies in serving data. It could be integrated into `serving_input_receiver_fn`. It doesn't make sense to have...

stat:awaiting tensorflower
type:feature

Hi According to the tfx examples, I pass the `pipeline_options` to `generate_statistics_from_csv` which set `--direct_num_workers=16` like: ```python pipeline_options = PipelineOptions(['--direct_num_workers=16']) ``` It's seem that this option cannot speed up this...

stat:awaiting tensorflower
type:performance

It seems that we can't use INT with missing values. For example, using the schema and the csv below would fail to validate: schema.pbtxt: ``` feature { name: "f1" type:...

stat:awaiting tensorflower
type:support

Towards the goal of adding support for computing statistics over structured data (e.g., arbitrary protocol buffers, parquet data), `GenerateStatistics` API will take Arrow tables as input instead of `Dict[FeatureName, ndarray]`....

Announcement

Hi, Looks like current CSV reader does not support the case where a quoted string value span a few lines (and line breaks are made). It means a logical CSV...

stat:awaiting tensorflower
type:feature