accelerate
accelerate copied to clipboard
Problem with Webdataset
System Info
- `Accelerate` version: 0.18.0
- Platform: Linux-5.4.0-139-generic-x86_64-with-glibc2.31
- Python version: 3.10.11
- Numpy version: 1.24.3
- PyTorch version (GPU?): 2.0.0+cu117 (True)
- `Accelerate` default config:
Not found
Information
- [ ] The official example scripts
- [X] My own modified scripts
Tasks
- [ ] One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - [X] My own task or dataset (give details below)
Reproduction
I tried to use webdataset with accelerate. I found that after accelerate.prepare(), the prompt for webdataset will give empty outputs, while before the accelerate.prepare(), everything is normal (i.e. it gives correct prompts).
how i set webdataset: dataset = ( wds.WebDataset(url) .shuffle(100) .decode("pil") .to_tuple("jpg;png", "txt") .map_tuple(transform, lambda x: x) ) dataloader = torch.utils.data.DataLoader(dataset, num_workers=4, batch_size=16)
running following lines before and after accelerate prepare() loader = iter(dataloader) image, prompt = next(loader) print(prompt)
Expected behavior
prompt becomes empty after accelerate.prepare()
cc @muellerzr
@dguo98 I'll need a full reproducer as I don't have any experience with webdataset to help debug this.
What is url here? What is transform? How are you launching your script? And how many GPUs does your system have?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I meet this problem too
@dguo98 how did you solve the problem, could you kindly share?