kmaeng

Results 4 comments of kmaeng

I also saw this. This happens when all the seq_length is zero for your batch. I just returned a zero tensor for the aux_loss and and it worked.

Hi, @shizhouxing, thank you for the reply! Is there any other framework you would recommend for L2 norm? It seems like your team have published a stream of works and...

Thanks @shizhouxing!! I will be awaiting your response.

Thanks for the help! The code became much faster by doing a vmap over a certain number of iterations. I am still interested in learning if I can easily avoid...