Erjia Guan

Results 35 issues of Erjia Guan

Fixes: https://github.com/pytorch/data/issues/718 Stack from [ghstack](https://github.com/ezyang/ghstack): * **#82975 [DataLoader] BC shuffle for MapDataPipe** * #82974 [DataPipe] Align shuffling behavior for IterDataPipe and MapDataPipe Add shuffling logic for `MapDataPipe` when using `DataLoader`

cla signed

Before this PR, if we lazily read data from each file from Rar Archive, the `fd` would change to wrong location to start reading. After this PR, we reset `fd`...

### 🚀 The feature When we generate the interface of functional API, we can scan `__init__` function for all type annotation and default variables that are imported from somewhere by...

Better Engineering

### 🚀 The feature This issue is used to track TODOs in my mind for DevInfra: ### CI - [x] Compare released git_version with the commit on the top of...

Better Engineering

### 🚀 The feature Add `SingleProcessReadingService` to adapt the graph for the sake of: - Shuffle seed setting per epoch - Set shuffle/sharding - etc. Pros: This would prevent making...

### 🐛 Describe the bug During the work to fixing the problem with unhashable DataPipe in https://github.com/pytorch/pytorch/pull/80509, I find this test is broken: https://github.com/pytorch/pytorch/blob/e266bea79395399d60bd3c684545f69ae6900236/test/test_datapipe.py#L2307-L2349 The failure is: `TypeError: cannot pickle...

### 📚 The doc issue The examples for Text/Vision/Audio are out-of-date: https://github.com/pytorch/data/tree/main/examples The colab attached in README needs to be updated as well: - How to install torchdata - Example...

### 🐛 Describe the bug I am trying to understand what would be the expected result when we traverse a `DataPipe` graph containing circular references. Assume we have the following...

### 🐛 Describe the bug There are a couple of places that `DataLoader2` uses `IterDataPipe` as the type hint ([here](https://github.com/pytorch/data/blob/12cfaf8899b1337981cd4edf9deef127f925f1bd/torchdata/dataloader2/dataloader2.py#L22), [here](https://github.com/pytorch/data/blob/12cfaf8899b1337981cd4edf9deef127f925f1bd/torchdata/dataloader2/reading_service.py#L17), etc.) We should add a type called `DataPipe =...