data icon indicating copy to clipboard operation
data copied to clipboard

DataLoader2 with multiprocess raise exception: Can not request next item while we are still waiting response for previous request

Open npuichigo opened this issue 1 year ago • 4 comments

🐛 Describe the bug

dp = ...
dp = dp.sharding_filter()
rs = MultiProcessingReadingService(num_workers=4)
dataloader = DataLoader2(dp, reading_service=rs)

for _ in dataloader:
    pass

dataloader.shutdown()
Exception: Can not request next item while we are still waiting response for previous request
This exception is thrown by __iter__ of _IterateQueueDataPipes(datapipes=[QueueWrapper, QueueWrapper, QueueWrapper, QueueWrapper])

Versions

[pip3] numpy==1.26.4
[pip3] onnx==1.15.0
[pip3] onnxconverter-common==1.13.0
[pip3] onnxruntime==1.15.1
[pip3] skl2onnx==1.16.0
[pip3] torch==2.2.1
[pip3] torchaudio==2.2.1
[pip3] torchdata==0.7.1
[pip3] torchvision==0.17.1
[pip3] triton==2.2.0
[conda] numpy                     1.26.4                   pypi_0    pypi
[conda] torch                     2.2.1                    pypi_0    pypi
[conda] torchaudio                2.2.1                    pypi_0    pypi
[conda] torchvision               0.17.1                   pypi_0    pypi
[conda] triton

npuichigo avatar Mar 12 '24 09:03 npuichigo

I have noticed this happens whenever the number of workers you specify for the MultiProcessingReadingService is greater than then the number elements that can be yielded from the dp before sharding.

jdenhof avatar Mar 26 '24 05:03 jdenhof

@jdenhof I chcek it, even with he number of workers specify for the MultiProcessingReadingService is smaller than the number elements that can be yielded from the dp before sharding. Still have this issue.

qmpzzpmq avatar Nov 07 '24 09:11 qmpzzpmq

I also noticed this issue at the end/start of the first epoch. Any fix?

ds2268 avatar Nov 09 '24 07:11 ds2268

@ds2268 I fixed it by https://github.com/pytorch/data/pull/1311

qmpzzpmq avatar Nov 13 '24 06:11 qmpzzpmq