Num-Workers-Search
Num-Workers-Search copied to clipboard
num_workers Search Algorithm for Fast PyTorch DataLoader
Num-Workers-Search
num_workers search algorithm for fast PyTorch DataLoader
Background
To find optimal num_workers for PyTorch DataLoader is the key towards fast training.
I measured the total time for loading all the training data of CIFAR10 Dataset with various num_workers size on my PC and Google's Colab.
on my PC
Spec
- CPU: i5-10400
- GPU: RTX 3070
- RAM:32GB
- OS: Windows 10
- SSD: 1TB
on Colab
As you see, to find optimal num_workers is very important for fast training. But it is hard to pick optimal num_workers by some formular because it varies with pc spec, dataset size, and batch size.
Solution
Trial and Error
It can spend a long time to search optimal num_workers but it will save the entire training time.
I use some tricks to reduce the execution time of this algorithm. If you are interested in this algorithm, refer to nws.py.
In-script workflow
import torch
import nws
batch_size = ...
dataset = ...
num_workers = nws.search(dataset=dataset,
batch_size=batch_size,
...)
loader = torch.utils.data.DataLoader(dataset=dataset,
batch_size=batch_size,
...,
num_workers=num_workers,
...)