insightface
insightface copied to clipboard
Training speed was not accelerated with DALI
Training: 412-Speed 556.82 samples/sec Loss 42.4856 LearningRate 0.001128 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 256 Required: 426 hours
2 gpu, bs=320, wf42m num_workers=16, sample_rate=0.2 dali_data_iter(..., num_threads=16, ...) cuda 10.1 -->install nvidia-dali-cuda102 / nvidia-dali-cuda100 why?
def get_dataloader( root_dir, local_rank, batch_size, dali = False, seed = 2048, num_workers = 8, ) -> Iterable:
rec = os.path.join(root_dir, 'train.rec')
idx = os.path.join(root_dir, 'train.idx')
train_set = None
# DALI
if dali:
return dali_data_iter(
batch_size=batch_size, rec_file=rec, idx_file=idx,
num_threads=16, local_rank=local_rank)
not use num_workers
I found DALI not accelerating the training speed either but it did stabilize the speed. I used to have speed fluctuating around 1000 to 2000 iter/s but with DALI it fixed to 1800 (T4, 8 gpu) .
oh, thanks!