albert icon indicating copy to clipboard operation
albert copied to clipboard

[ALBERT] TFHub assets/albert_config.json has 0 for dropouts

Open kpe opened this issue 6 years ago • 2 comments

I noticed that the TFHub released ALBERT v2 models specify zeros for attention_probs_dropout_prob and hidden_dropout_prob in the assets/albert_config.json:

{
  "attention_probs_dropout_prob": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0,
  ...
}

however the [README](https://tfhub.dev/google/albert_base/2) in TFHub  specifies:
```json
{
  "attention_probs_dropout_prob": 0.1,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
 ...
}

May be one of them (README or assets/albert_config.json) could be update?

I'm also wondering if it is a good idea to provide a do_lower_case flag somewhere under assets/ - just as a minimal specification for the required text pre-processing? Probably such a do_lower_case belongs to the sentencepiece model, what do you think?

kpe avatar Nov 19 '19 15:11 kpe

I see the README on TFHub have not updated yet, these parameters are as same as V1. As the author mentioned on README on this project (github), they use no drop out on version 2.

The flag do_lower_case in file create_pretraining_data.py for text pre-processing also create tfrecord that pre-training use, so I think not need to have do_lower_case on other files.

ngoanpv avatar Nov 19 '19 18:11 ngoanpv

Thank you, @ngoanpv! About the do_lower_case flag - currently if I download the pre-trained ALBERT weights and sentencepiece model from TFHub, I have to know how to pre-process the input - should I do lower case or leave the case as it is (i.e. how to set the do_lower_case flag when calling preprocess_text)?

Currently if I call the sentencepiece tokenizer with an upper case text it will replace all upper case characters with <unk>.

As I just learned - see https://github.com/google/sentencepiece/issues/425 - case folding could be done by the sentencepiece model if it is trained properly:

 spm_train --normalization_rule_name=nmt_nfkc_cf ...

e.g. by using a case folding normalization rule. This way the case handling would be encapsulated within the sentencepiece model.

kpe avatar Nov 20 '19 09:11 kpe