dlrover icon indicating copy to clipboard operation
dlrover copied to clipboard

[Feature]: Summarize the elapsed time of PyTorch ops in a training job.

Open workingloong opened this issue 2 years ago • 1 comments
trafficstars

Users usually need to detect the bottleneck of the training pipeline by viewing the elapsed time of ops. If we can automatically summarize the elapsed time after the training starts, we can automatically detect the bottleneck and make efforts to mitigate the bottleneck or give some suggestions to users.

workingloong avatar Sep 06 '23 07:09 workingloong

import time

def train(): for i, epoch in enumerate(range(start_epoch, end_epoch)): for train_sample in train_data_loader: start_time = time.time() doing... print('Time consuming: {}s'.format(time.time() - start_time))

created-Bi avatar Oct 24 '23 03:10 created-Bi

This issue has been automatically marked as stale because it has not had recent activity.

github-actions[bot] avatar Oct 23 '24 01:10 github-actions[bot]