cuda-profiler
cuda-profiler copied to clipboard
dlprof tools
The dlprof tool analyzed the deep model and proposed that the data shape did not meet the requirements of tensor core. The original script set five full connection layers, namely, shape (8,1024), (1024,1024), (1024,512), (512,1) and batch=128. When the batch=64, the improvements proposed by DLprof were resolved. why?I did not change the shape(512,1) to (512,8)
this is my code in github, https://github.com/fenfaqingnian/dlprof_v100/tree/master/Profiler_DLprof_TF1-master