models
models copied to clipboard
Investigate why dropout layer makes training significantly slower.
@vysarge has done a number of benchmark and profiling experiments using using synthetic data comparing Merlin Models. In particular, she compared DLRM with the JoC DLRM TF implementation, whose experiments results can be found in this spreadsheet (Nvidia internal only).
She noticed that
with dropout layer on (with parameter 0.08), MM iteration takes 10.1ms longer The majority of this time is attributable to additional calls to Mul_GPU_DT_FLOAT_DT_FLOAT_kernel