Osama Amjad

Results 2 comments of Osama Amjad

These are my loss and learning rate curve for model training on single GPU with 2 batch size on Waymo dataset. After 2nd epoch it went NaN. ![Screenshot from 2024-04-27...

Thanks for reply. Another question is I investigated the optimizer code, you are distributing model parameters in to 2 groups one with 'block' keyword in them and other with all...