David Siegel

Results 3 comments of David Siegel

I have the same issue. Using tfio.IODataset.from_parquet is extremely slow when used with interleave, but without interleave seems to work fine.

I found that moving the `batch inside the `interleave` speeds things up a lot. Also "deterministic=False" might help. cycle_length should be >= num_workers. ```python ds.interleave( lambda f: tfio.IODataset.from_parquet(f, columns=columns).batch(2000), cycle_length=num_workers,...

It's definitely something to do with compressedNodes. The problem starts happening as soon as the pipeline gets big enough to trigger Argo to start using the compressedNodes method of storing...