data icon indicating copy to clipboard operation
data copied to clipboard

[DataPipe] DataPipe Deprecation Tracker

Open NivekT opened this issue 3 years ago • 10 comments

We have a number of DataPipes that are being deprecated. Our general policy is that we first mark the DataPipe as deprecated with a warning, and wait at least one release cycle (~3 months) before removing it. Note that some DataPipes will be removed from the PyTorch Core library but will remain in TorchData, and some others are renamed.

Status Types:

  • Deprecated - marked as deprecated with a warning
  • Removed - removed from repository

DataLoader2 Tracker

Name Deprecation Date Status Earliest Removal Version
PrototypeMultiProcessingReadingService -> MultiProcessingReadingService 0.6 Deprecated 0.8

IterDataPipe Tracker

Name Functional API Module Deprecation Date Status Earliest Removal Version
BucketBatcher NA Core Sep 30th, 2021 Removed (moved to TorchData)
HTTPReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
LineReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
TarArchiveReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
ZipArchiveReader NA Core Sep 30th, 2021 Removed (moved to TorchData)
FileLoader NA Core Jan 5th, 2022 Removed (use FileOpener) 1.13 (Sept 2022)
FileLoader NA Data Jan 5th, 2022 Removed (use FileOpener)
IoPathFileLoader load_file_by_iopath Data Jan 5th, 2022 Removed (use IoPathFileOpener)
RoutedDecoder routed_decode Core Jan 10th, 2022 Deprecated 1.13 (Sept 2022)
TarArchiveReader read_from_tar Data Feb 22th, 2022 Removed (use TarArchiveLoader) 0.5 (Sept 2022)
XzFileReader read_from_xz Data Feb 22th, 2022 Removed (use XzFileLoader) 0.5 (Sept 2022)
ZipArchiveReader read_from_zip Data Feb 22th, 2022 Removed (use ZipArchiveLoader) 0.5 (Sept 2022)
Filter filter Core 1.12 Removed argument (drop_empty_batches) 2.0 (Nov 2022)
FSSpecFileOpener open_files_by_fsspec Data 0.4 open_file_by_fsspec is Removed 0.6 (Nov 2022)
IoPathFileOpener open_files_by_fsspec Data 0.4 open_file_by_iopath is Removed 0.6 (Nov 2022)

MapDataPipe Tracker

Nothing for now

cc: @ejguan @VitalyFedyunin @NivekT

NivekT avatar Jan 11 '22 18:01 NivekT

For TarArchiveReader, should we add a deprecation warning in main branch as 0.3.0 branch cut has been finished.

ejguan avatar Mar 01 '22 19:03 ejguan

Another Misc tracker:

Name Module Deprecation Version Status Earliest Removal Version
torch.utils.data.graph.traverse Core 1.13 Deprecating 1.15 / 2.1

ejguan avatar Sep 19 '22 20:09 ejguan

I see RoutedDecoder has been marked as deprecated: what is it going to be replaced by?

BlueskyFR avatar Nov 23 '22 16:11 BlueskyFR

I see RoutedDecoder has been marked as deprecated: what is it going to be replaced by?

@BlueskyFR IIRC, we plan to remove this DataPipe in the future. The general reason is that we think this can be easily achieved by using a demux based on file types then decode each datapipe correspondingly then mux them together. Glad to hear your use case.

ejguan avatar Nov 23 '22 16:11 ejguan

I see RoutedDecoder has been marked as deprecated: what is it going to be replaced by?

@BlueskyFR IIRC, we plan to remove this DataPipe in the future. The general reason is that we think this can be easily achieved by using a demux based on file types then decode each datapipe correspondingly then mux them together. Glad to hear your use case.

I don't understand: how should I proceed to decode a PNG image in the current state then?

BlueskyFR avatar Nov 23 '22 18:11 BlueskyFR

You can use a map function like datapipe.map(decode_fn) to decode the PNG image

ejguan avatar Nov 23 '22 19:11 ejguan

You can use a map function like datapipe.map(decode_fn) to decode the PNG image

Okay, but why was support for decoding dropped then?

BlueskyFR avatar Nov 23 '22 19:11 BlueskyFR

Okay, but why was support for decoding dropped then?

decoding didn't do more things like a map function, except we provided a few decoding functions for convenient. And, in order to support routed_decode, we need to add lots of decoding functions to cover the general file decoding, which is not sustainable for us to maintain and it makes the routed_decode more complicated and redundant. For example of your use case (decoding PNG), the routed_decode would add more decoding handlers such as json, pickle, etc. into this DataPipe.

As, TorchData provides composable way to construct pipeline, users should be able to create a pipeline to handle specific decoding mechanism

ejguan avatar Nov 23 '22 19:11 ejguan

Okay, but why was support for decoding dropped then?

decoding didn't do more things like a map function, except we provided a few decoding functions for convenient. And, in order to support routed_decode, we need to add lots of decoding functions to cover the general file decoding, which is not sustainable for us to maintain and it makes the routed_decode more complicated and redundant. For example of your use case (decoding PNG), the routed_decode would add more decoding handlers such as json, pickle, etc. into this DataPipe.

As, TorchData provides composable way to construct pipeline, users should be able to create a pipeline to handle specific decoding mechanism

Okay. What is the preferred mechanism to decode images? Ideally I think it should be done in batches if performance is needed

BlueskyFR avatar Nov 23 '22 19:11 BlueskyFR

Okay. What is the preferred mechanism to decode images? Ideally I think it should be done in batches if performance is needed

It depends on if your decode_fn supports batched decoding in high performance (multithreading). Otherwise, I think it's going to be similar to do decoding per image.

ejguan avatar Nov 23 '22 19:11 ejguan