HybridBackend
HybridBackend copied to clipboard
Filter_func in parqeut reader
User Story
It is a common process that map, filter and batch, in row-based storage format, like tfrecord. But in parquet format, transforming to row-based dataset performs very badly and fitlering data after batch will bring the size of batch fluctuating drasticly. So we suppose to add a filter_func
in read_parquet interface that helps user to get a clean batch directly.
Detailed requirements
add filter_func
in hybridbackend.tensorflow.data.read_parquet(batch_size, fields=None, partition_count=1, partition_index=0, drop_remainder=False, num_parallel_reads=None, num_sequential_reads=1, filter_func=None, map_func=None)
API Compatibility
At least tensorflow 1.14 and 1.15