Megatron-LM
Megatron-LM copied to clipboard
Remove Redundant Host & Device Sync
This unnecessary sync breaks CUDA graph of Stable Diffusion in NeMo.
@jaredcasper Please take a review, thanks!