accelerate
accelerate copied to clipboard
Impossibility to use num_workers and prefetch_factor when using StatefulDataLoader (use_stateful_dataloader=True)
System Info
- `Accelerate` version: 0.34.2
- Platform: Linux-5.15.0-1057-aws-x86_64-with-glibc2.31
- `accelerate` bash location: /fsx/umar/miniconda3/envs/memory-efficient-transformers/bin/accelerate
- Python version: 3.10.14
- Numpy version: 1.26.4
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 1999.99 GB
- GPU type: NVIDIA H100 80GB HBM3
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
dataset_streaming = True
ds_train = ... # Dataset loaded with streaming=True
train_batch_size = 12
collator = DataCollatorForLanguageModeling(...)
dataloader_num_workers = 4
dataloader_prefetch_factor = 10
dl_trainer = DataLoader(
ds_train,
batch_size=train_batch_size,
collate_fn=collator,
shuffle=not dataset_streaming,
drop_last=True,
num_workers=dataloader_num_workers,
prefetch_factor=dataloader_prefetch_factor,
pin_memory=True,
)
model, optimizer, scheduler, dl_eval, dl_trainer = accelerator.prepare(
model, optimizer, scheduler, dl_eval, dl_trainer
)
for _, batch in enumerate(dl_trainer):
training_loop()
A DataLoader initialized with num_workers results in the following errors when iterating through the wrapper DataLoader:
[rank0]: for _, batch in batch_enumerator:
[rank0]: File "/fsx/umar/miniconda3/envs/memory-efficient-transformers/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
[rank0]: for obj in iterable:
[rank0]: File "/fsx/umar/miniconda3/envs/memory-efficient-transformers/lib/python3.10/site-packages/accelerate/data_loader.py", line 798, in __iter__
[rank0]: next_batch, next_batch_info = self._fetch_batches(main_iterator)
[rank0]: File "/fsx/umar/miniconda3/envs/memory-efficient-transformers/lib/python3.10/site-packages/accelerate/data_loader.py", line 751, in _fetch_batches
[rank0]: self._update_state_dict()
[rank0]: File "/fsx/umar/miniconda3/envs/memory-efficient-transformers/lib/python3.10/site-packages/accelerate/data_loader.py", line 479, in _update_state_dict
[rank0]: self.adjust_state_dict_for_prefetch()
[rank0]: File "/fsx/umar/miniconda3/envs/memory-efficient-transformers/lib/python3.10/site-packages/accelerate/data_loader.py", line 459, in adjust_state_dict_for_prefetch
[rank0]: if self.dl_state_dict["_sampler_iter_yielded"] > 0:
[rank0]: KeyError: '_sampler_iter_yielded'
I also tried with the latest development version of accelerate (https://github.com/huggingface/accelerate@9f9951325c69f0a6c7c8ab00df2ab8af23b3c1fa) but I still get the same error.
@muellerzr is aware of this issue.
Expected behavior
I'd like the possibility to prefetch multiple samples and that is only possible by specifying num_workers to a number greater than 0.
@muellerzr Hi, wondering if there are any progress on this bug. 👀 I also met this when trying the latest accelerate.
Same here. Any progress on this one? Using this flag with "dataloader_num_workers" anything other than "0" triggers this error! It would be nice to be able to save the state of the data loader AND not be hobbled by serializing the data loader.