Jishnu Ray Chowdhury comments

Results 38 comments of


                                            Jishnu Ray Chowdhury

Instability with HyperProp

Don't really know what the "defaults" are...for LaProp there probably isn't any real official default. For Adam, I don't remember if there was a good recommended defualt. Probably it was...

Instability with HyperProp

Yes, it seems AdaMod and RAdam would both reduce the initial variance and more. You can still probably try to use it, though there probably would not much of a...

Instability with HyperProp

If you have alpha=1 you will essentially make lookahead redundant. A more principled way to disable lookahead is setting k=0 IIRC, that will deactivate all the lookahead computation. I think...

Follow-up question

It's the original pre-trained bert (the multilinguial one) pre-trained by Google on wikipedia and stuff. I didn't pre-train it further with Tweet Data though someday I might. I have a...

Follow-up question

That's strange. I don't know why. Is the only thing you changed is the model? What is the binary/multi accuracy? And what about the cross entropy loss during training? Are...

Did you locally saved BERT beforehand?: https://github.com/JRC1995/BERT-Disaster-Classification-Capsule-Routing#saving-multilingual-bert https://github.com/JRC1995/BERT-Disaster-Classification-Capsule-Routing/blob/master/Classification/Save_pre_trained_locally.py If so, could be due to version mismatch issues. You can also try running it in an environment with a older huggingface...

Follow-up question

After locally downloading bert does the files show up in the Pre_trained_BERT folder or is it empty? The local download file is designed to download all the relevant files to...

Prompt size limits? It keeps hanging with prompts longer than 120 tokens

@ktolias, it seems you are counting "tokens" at the word level (using space tokenizer). That would not reflect the true token count since practically some sub-word level tokenizer is being...