Runqi Yang
Runqi Yang
Recently you updated "BERT pretrained on mixed large Chinese corpus (bert-large 24-layers) " on ReadMe. What hyperparameters (lr, batch size, max epochs) did you use when fine-tuning on CLUE?
When using the following command for training: ```sh python train.py --data_path PATH_TO_PROCESSED_DATA --enc_grad_norm False ``` "enc_grad_norm" is still "True". Should this be fixed in [this way](https://stackoverflow.com/questions/15008758/parsing-boolean-values-with-argparse)?
Line 242-244 function `evaluate_autoencoder`, `train.py`: ```python all_accuracies += \ torch.mean(max_indices.eq(masked_target).float()).data[0] bcnt += 1 ``` Since masks in batches are different, in each batch there'll be a different number of indices...