data icon indicating copy to clipboard operation
data copied to clipboard

S3FileLoaderIterDataPipe buffer_size

Open commonism opened this issue 2 years ago • 0 comments

📚 The doc issue

The default for S3 buffer size is 128 MB - or 128 * (1024**2) https://github.com/pytorch/data/blob/a5b4720dece60565788ac4c9a85e01719188b28e/torchdata/csrc/pybind/S3Handler/S3Handler.cpp#L15

The example for S3FileLoaderIterDataPipe uses a buffer_size of 256. https://github.com/pytorch/data/blob/a5b4720dece60565788ac4c9a85e01719188b28e/torchdata/datapipes/iter/load/s3io.py#L154

Using a 256 bytes buffer degrades performance and allows the assumption buffer_size is provided in mbytes, as the example would double the 128 mbyte default.

Suggest a potential alternative/fix

document buffer_size to be in bytes and have the example use 256 * (1024**2) as value.

commonism avatar Nov 17 '23 09:11 commonism