data-validation
data-validation copied to clipboard
Issue using `allowlist_features` and `denylist_features` in `visualize_statistics`
Overview
I'm having issues specifying the features to include/exclude when visualizing stats in TFDV. It seems like the allowlist_features
and denylist_features
require a tensorflow_data_validation.types.FeaturePath
object, which took a bit to figure out how to construct. This doesn't seem that user friendly -- was it intended to allow a list of strings to be passed?
Code to reproduce
I can reproduce the problem in the public colab example. In the "Compute and Visualize Statistics" section of the above notebook, update the visualize_statistics
call to be:
tfdv.visualize_statistics(train_stats, denylist_features=['pickup_community_area'])
. The first feature shouldn't exist in the visualized example (if I'm calling this correctly).

Workaround code
To make this work, I have to manually construct a tensorflow_data_validation.types.FeaturePath
object. Perhaps it would be better to do the filter comparison on each feature's path
string?
# Show string name of feature
first_feat = train_stats.datasets[0].features[0]
print(first_feat.path)
# Construct necessary object to make `allowlist_feature` filter work
from tensorflow_data_validation import types
print(types.FeaturePath.from_proto(first_feat.path))
# docs-infra: no-execute
tfdv.visualize_statistics(train_stats, allowlist_features=[types.FeaturePath.from_proto(first_feat.path)])
