data issues

Auto import variables for type annotation and default value for functional API during pyi gen

### 🚀 The feature When we generate the interface of functional API, we can scan `__init__` function for all type annotation and default variables that are imported from somewhere by...

ejguan

Better Engineering

Tracker for DevInfra Improvement

4

### 🚀 The feature This issue is used to track TODOs in my mind for DevInfra: ### CI - [x] Compare released git_version with the commit on the top of...

ejguan

Better Engineering

Support Key/Value databases

7

### 🚀 The feature The existing cacheholder leverages a python dictionary ```python @functional_datapipe("in_memory_cache") class InMemoryCacheHolderMapDataPipe(MapDataPipe[T_co]): def __init__(self, source_dp: MapDataPipe[T_co]) -> None: self.source_dp: MapDataPipe[T_co] = source_dp self.cache: Dict[Any, T_co] = {}...

msaroufim

enhancement

feature

Allow overriding of existing functional APIs on DataPipe subclasses

1

### 🚀 The feature Allow users to override existing functional APIs on their own DataPipe subclasses. ### Motivation, pitch Consider the use case where I iterate over a tensor in...

BarclayII

Recommended practice to shuffle data with datapipes differently every epoch

4

### 📚 The doc issue I was trying `torchdata` 0.4.0 and I found that shuffling with data pipes will always yield the same result across different epochs, unless I shuffle...

BarclayII

Add `SingleProcessReadingService` for `DataLoader2` by default

### 🚀 The feature Add `SingleProcessReadingService` to adapt the graph for the sake of: - Shuffle seed setting per epoch - Set shuffle/sharding - etc. Pros: This would prevent making...

ejguan

Dataloader Hangs on Exit

5

### 🐛 Describe the bug I am running into an issue where the dataloaders hang for 5 seconds per dataloader worker on exit. I have `persistent_workers=True` and `pin_memory=True`. Here is...

ravi-mosaicml

module: dataloader

triaged

Will TorchData provides performance comparision with DataLoaderV1？

1

MARD1NO

topic: performance

DataLoader2 with reading service

For user dev and onboarding experience of the data component, we will provide examples, tutorials, up-to-date documentations as well as the operational support. We added a simple train loop example....

dahsh

documentation

Read Parquet Files Directly from S3?

2

### 🚀 The feature The `ParquetDataFrameLoader` allows us to read parquet files from the local file system, but I don't think it supports reading parquet files from (for example) an...

vedantroy

enhancement

feature

data
data copied to clipboard

Metadata

Auto import variables for type annotation and default value for functional API during pyi gen

Tracker for DevInfra Improvement

Support Key/Value databases

Allow overriding of existing functional APIs on DataPipe subclasses

Recommended practice to shuffle data with datapipes differently every epoch

Add `SingleProcessReadingService` for `DataLoader2` by default

Dataloader Hangs on Exit

Will TorchData provides performance comparision with DataLoaderV1？

DataLoader2 with reading service

Read Parquet Files Directly from S3?

← Metadata

Owner

Metadata

data data copied to clipboard

Metadata

← Metadata

Owner

Metadata

data
data copied to clipboard