TransformerEngine
TransformerEngine copied to clipboard
[UB] Adding support for multinode nvlink
This adds support for multi-node nvlink architecture. In addition it includes changes for making CE deadlock checker configurable at the runtime.
@denera I think this branch is ready for upstream but I know u have outstanding changes as well.
@shamisp Let's review and merge this branch. I will rebase my PR on top later.
/te-ci pytorch