Jishnu Ray Chowdhury
Jishnu Ray Chowdhury
Don't really know what the "defaults" are...for LaProp there probably isn't any real official default. For Adam, I don't remember if there was a good recommended defualt. Probably it was...
Yes, it seems AdaMod and RAdam would both reduce the initial variance and more. You can still probably try to use it, though there probably would not much of a...
If you have alpha=1 you will essentially make lookahead redundant. A more principled way to disable lookahead is setting k=0 IIRC, that will deactivate all the lookahead computation. I think...
It's the original pre-trained bert (the multilinguial one) pre-trained by Google on wikipedia and stuff. I didn't pre-train it further with Tweet Data though someday I might. I have a...
That's strange. I don't know why. Is the only thing you changed is the model? What is the binary/multi accuracy? And what about the cross entropy loss during training? Are...
Did you locally saved BERT beforehand?: https://github.com/JRC1995/BERT-Disaster-Classification-Capsule-Routing#saving-multilingual-bert https://github.com/JRC1995/BERT-Disaster-Classification-Capsule-Routing/blob/master/Classification/Save_pre_trained_locally.py If so, could be due to version mismatch issues. You can also try running it in an environment with a older huggingface...
After locally downloading bert does the files show up in the Pre_trained_BERT folder or is it empty? The local download file is designed to download all the relevant files to...
@ktolias, it seems you are counting "tokens" at the word level (using space tokenizer). That would not reflect the true token count since practically some sub-word level tokenizer is being...