Investigate why dropout layer makes training significantly slower.

Open gabrielspmoreira opened this issue 2 years ago • 0 comments

@vysarge has done a number of benchmark and profiling experiments using using synthetic data comparing Merlin Models. In particular, she compared DLRM with the JoC DLRM TF implementation, whose experiments results can be found in this spreadsheet (Nvidia internal only).

She noticed that

with dropout layer on (with parameter 0.08), MM iteration takes 10.1ms longer The majority of this time is attributable to additional calls to Mul_GPU_DT_FLOAT_DT_FLOAT_kernel

Mar 24 '23 15:03 gabrielspmoreira