DeepSpeedExamples
DeepSpeedExamples copied to clipboard
Unit in (model-only) latency
mtimes is multiple by 1000 to get the time in the unit of ms in print_latency, but it is already in the unit of ms.
use_cuda_events is true by default in function profile_model_time.
From https://pytorch.org/docs/stable/generated/torch.cuda.Event.html,
Returns the time elapsed in milliseconds after the event was recorded and before the end_event was recorded.