karamavusibrahim
karamavusibrahim
```python url = #my tar file paths preprocess_wds = #transform functions train_dataset = wds.WebDataset(url,resampled=True).shuffle(1000) train_dataset = (train_dataset.decode("pil",handler=wds.warn_and_continue).to_tuple("jpg;png")) train_dataset = train_dataset.map(preprocess_wds) train_dataset = train_dataset.with_epoch(10000) train_dataloader = torch.utils.data.DataLoader(train_dataset,num_workers=12, batch_size=args.train_batch_size,shuffle=False,persistent_workers=True,collate_fn=collate_fn) #it works train_dataloader...
According to my tests, using webdataset+webloader is faster than using webdataset+dataloader, so I try to use the webdataset+webloader config.