poke
poke
i want to compare with bert model on you dataset, but i cannot find your leaderboard
hi, i find that al-bert speeds up training, but slows the forward
i use sru instead of lstm, after traing for 2th batch, i get this error: RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:70
how to set l from grad(x, l)? what it means when i is bigger?
can blocksparse support gpu?
cl: error: unrecognized arguments: --request-network with version 0.4.21
i use m0 to classify for unknown question, but get big loss, how do you adapt it to suqad2.0?