data issues

Data loader directly use loaded state dict when save right after load

1

Summary: If no iterator is created in the middle of load_state_dict and state_dict calls, we should be able to directly return the original state dict without triggering reading service because...

MiiiraK

CLA Signed

fb-exported

Dataloader is slow with iterdatapipes and shuffle that has large in-memory fields (because traverse_dps is slow)

3

### 🐛 Describe the bug Hello, I found that a standard DataLoader takes unreasonably long to construct itself and to load the first batch if there is a filed in...

olegsinavski

module: dataloader

triaged

prefetcher shutdown hang when running with multiprocess + distributed reading service

1

### 🐛 Describe the bug Prefetcher will hang indefinitely on shutdown(), the faulthandler stack traces indicates that main thread is blocked on https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/prefetcher.py#L113 while child thread is blocked on https://github.com/pytorch/data/blob/main/torchdata/datapipes/iter/util/prefetcher.py#L81,...

zhengwy888

fix typo in _assert_portalocker

1

### Changes - change `portalocker>=2/0.0` to `portalocker>=2.0.0`

Sciroccogti

CLA Signed

`v2.1.2+cu118` and `v2.1.1+cu118` run into torchdata `ImportError: libssl.so.3: cannot open shared object file: No such file or directory`, that `v2.1.0+cu118` doesn't have an issue with

1

### 🐛 Describe the bug We are noticing a strange error specifically when using torch2.1.1+cu118 and torch2.1.2+cu118 , that is not an issue with torch2.1.0+cu118. The error looks like this:...

justinxzhao

Iterating a data pipe, created with random split, ends in error as the code tries to iterate past the data pipe lenght

### 🐛 Describe the bug iterating trough a data pipe, generated to a random split iters correctly trough all the data it is supposed to , but unfortunately it does...

thecaptain2000

S3FileLoaderIterDataPipe buffer_size

### 📚 The doc issue The default for S3 buffer size is 128 MB - or 128 * (1024**2) https://github.com/pytorch/data/blob/a5b4720dece60565788ac4c9a85e01719188b28e/torchdata/csrc/pybind/S3Handler/S3Handler.cpp#L15 The example for S3FileLoaderIterDataPipe uses a buffer_size of 256. https://github.com/pytorch/data/blob/a5b4720dece60565788ac4c9a85e01719188b28e/torchdata/datapipes/iter/load/s3io.py#L154...

commonism

Support for recursive datatypes

7

since release of mypy 0.981 recursive types are supported; i have just removed the `#` as per suggestion in the TODO comment and have changed the mypy version in the...

abhi-glitchhg

CLA Signed

Add torch 2.1.0 to Version Compatibility

3

This updates README.md with latest pytorch version 2.1.0 and torchdata version 0.7.0

atalman

CLA Signed

Use GitHub M1 runners

1

To build and test conda and wheel for release.

huydhn

CLA Signed

data
data copied to clipboard

Metadata

Data loader directly use loaded state dict when save right after load

Dataloader is slow with iterdatapipes and shuffle that has large in-memory fields (because traverse_dps is slow)

prefetcher shutdown hang when running with multiprocess + distributed reading service

fix typo in _assert_portalocker

`v2.1.2+cu118` and `v2.1.1+cu118` run into torchdata `ImportError: libssl.so.3: cannot open shared object file: No such file or directory`, that `v2.1.0+cu118` doesn't have an issue with

Iterating a data pipe, created with random split, ends in error as the code tries to iterate past the data pipe lenght

S3FileLoaderIterDataPipe buffer_size

Support for recursive datatypes

Add torch 2.1.0 to Version Compatibility

Use GitHub M1 runners

← Metadata

Owner

Metadata

data data copied to clipboard

Metadata

← Metadata

Owner

Metadata

data
data copied to clipboard