data
data copied to clipboard
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.
### 🚀 The feature When we generate the interface of functional API, we can scan `__init__` function for all type annotation and default variables that are imported from somewhere by...
### 🚀 The feature This issue is used to track TODOs in my mind for DevInfra: ### CI - [x] Compare released git_version with the commit on the top of...
### 🚀 The feature The existing cacheholder leverages a python dictionary ```python @functional_datapipe("in_memory_cache") class InMemoryCacheHolderMapDataPipe(MapDataPipe[T_co]): def __init__(self, source_dp: MapDataPipe[T_co]) -> None: self.source_dp: MapDataPipe[T_co] = source_dp self.cache: Dict[Any, T_co] = {}...
### 🚀 The feature Allow users to override existing functional APIs on their own DataPipe subclasses. ### Motivation, pitch Consider the use case where I iterate over a tensor in...
### 📚 The doc issue I was trying `torchdata` 0.4.0 and I found that shuffling with data pipes will always yield the same result across different epochs, unless I shuffle...
### 🚀 The feature Add `SingleProcessReadingService` to adapt the graph for the sake of: - Shuffle seed setting per epoch - Set shuffle/sharding - etc. Pros: This would prevent making...
### 🐛 Describe the bug I am running into an issue where the dataloaders hang for 5 seconds per dataloader worker on exit. I have `persistent_workers=True` and `pin_memory=True`. Here is...
For user dev and onboarding experience of the data component, we will provide examples, tutorials, up-to-date documentations as well as the operational support. We added a simple train loop example....
### 🚀 The feature The `ParquetDataFrameLoader` allows us to read parquet files from the local file system, but I don't think it supports reading parquet files from (for example) an...