TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

[UB] Adding support for multinode nvlink

Open shamisp opened this issue 9 months ago • 3 comments

This adds support for multi-node nvlink architecture. In addition it includes changes for making CE deadlock checker configurable at the runtime.

shamisp avatar Apr 26 '24 15:04 shamisp

@denera I think this branch is ready for upstream but I know u have outstanding changes as well.

shamisp avatar Jun 04 '24 22:06 shamisp

@shamisp Let's review and merge this branch. I will rebase my PR on top later.

denera avatar Jun 04 '24 22:06 denera

/te-ci pytorch

timmoon10 avatar Jun 13 '24 00:06 timmoon10