insightface Training speed was not accelerated with DALI

Training speed was not accelerated with DALI

Open eeric opened this issue 2 years ago • 3 comments

Training: 412-Speed 556.82 samples/sec Loss 42.4856 LearningRate 0.001128 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 256 Required: 426 hours

2 gpu, bs=320, wf42m num_workers=16, sample_rate=0.2 dali_data_iter(..., num_threads=16, ...) cuda 10.1 -->install nvidia-dali-cuda102 / nvidia-dali-cuda100 why?

Jun 11 '22 04:06 eeric

def get_dataloader( root_dir, local_rank, batch_size, dali = False, seed = 2048, num_workers = 8, ) -> Iterable:

rec = os.path.join(root_dir, 'train.rec')
idx = os.path.join(root_dir, 'train.idx')
train_set = None

# DALI
if dali:
    return dali_data_iter(
        batch_size=batch_size, rec_file=rec, idx_file=idx,
        num_threads=16, local_rank=local_rank)

not use num_workers

Jun 11 '22 07:06 eeric

I found DALI not accelerating the training speed either but it did stabilize the speed. I used to have speed fluctuating around 1000 to 2000 iter/s but with DALI it fixed to 1800 (T4, 8 gpu) .

Jun 14 '22 02:06 jacqueline-weng

oh, thanks!

Jun 14 '22 03:06 eeric

insightface insightface copied to clipboard

Training speed was not accelerated with DALI

insightface
insightface copied to clipboard