Shanmugam Ramasamy

Results 26 comments of Shanmugam Ramasamy

WandB project for tp,pp, sp both nemo legacy bert model and mcore bert model. The curves match https://wandb.ai/shanmugamr/mcore_test?workspace=user-shanmugamr

@erhoo82 Can you takea look into this.

Hi @andrewvli , this is about the transformer engine version you are using and the type of attention you are using. There are a couple of ways to fix this...