Quentin Lhoest
                                            Quentin Lhoest
                                        
                                    Can you remove `auto_rename_labels` ? I don't think it's a good idea to add it if the plan is to remove it later
> Not yet sure if it's good for modalities like images. We store images pretty much the same way as tensorflow_datasets (i.e. storing the encoded image bytes, or a path...
> So for image datasets, we could potentially store the paths in the feather format and decode and read them on the fly? hopefully yes :) I double-checked the TFDS...
Cool thanks ! If I understand correctly in your PoC you store the flattened array of pixels in the feather file. This will take a lot of disk space. Maybe...
Cool thanks ! Too bad the Arrow binary type doesn't seem to be supported in `arrow_io.ArrowFeatherDataset` :/ We would also need it to support Arrow struct type. Indeed images in...
> IIUC, in my [latest PoC notebook](https://gist.github.com/sayakpaul/f7d5cc312cd01cb31098fad3fd9c6b59#file-feather-tf-poc-bytes-ipynb), you wanted to see each entry in the feather file to be represented like so? > > pa.struct({"path": pa.string(), "bytes": pa.binary()}) Yea because...
> I don't understand this. I would think TFRecords would also need something similar but I need the context you're coming from. Users already have a copy of the dataset...
If `to_tf_dataset` can be unbatched, then it should be fairly easy for users to convert the TF dataset to TFRecords right ?
Someone would like to try to dive into tfio to fix this ? Sounds like a good opportunity to learn what are the best ways to load a dataset for...
Thanks ! I think the speed difference can be partly explained: you use ds.shuffle in your dataset, which is an exact shuffling (compared to TFDS which does buffer shuffling): it...