David Siegel
David Siegel
I have the same issue. Using tfio.IODataset.from_parquet is extremely slow when used with interleave, but without interleave seems to work fine.
I found that moving the `batch inside the `interleave` speeds things up a lot. Also "deterministic=False" might help. cycle_length should be >= num_workers. ```python ds.interleave( lambda f: tfio.IODataset.from_parquet(f, columns=columns).batch(2000), cycle_length=num_workers,...
It's definitely something to do with compressedNodes. The problem starts happening as soon as the pipeline gets big enough to trigger Argo to start using the compressedNodes method of storing...