Shanmugam Ramasamy
Shanmugam Ramasamy
jenkins
jenkins
WandB project for tp,pp, sp both nemo legacy bert model and mcore bert model. The curves match https://wandb.ai/shanmugamr/mcore_test?workspace=user-shanmugamr
jenkins
@erhoo82 Can you takea look into this.
jenkins
jenkins
jenkins
jenkins
Hi @andrewvli , this is about the transformer engine version you are using and the type of attention you are using. There are a couple of ways to fix this...