electra issues

Results 62 electra issues

Sort by recently updated

max_predictions_per_seq and TPU training configuration

# Overview I am attempting to train the small version of Electra on a custom vocabulary. Looking at the performances I see that `max_predictions_per_seq` is set to an heuristic formula:...

Mistobaan

Metrics definition

Hello, may someone please explain what does each evaluation metric indicates. ``` disc_accuracy = 0.86676794 disc_auc = 0.6815034 disc_loss = 0.35936752 disc_precision = 0.7586109 disc_recall = 0.04827826 global_step = 5000...

IssaIssa1

Data loss: truncated record at 10035180

When I run "python3 run_pretraining.py --data-dir $DATA_DIR --model-name electra_small_owt", I face the following error: ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found. (0) Data loss: truncated record at 10035180 [[node...

jjkim-zz

"Device or resource busy" for mounted paths

Running ``` python3 build_openwebtext_pretraining_dataset.py --data-dir data --num-processes 8 ``` gives the error: ``` Traceback (most recent call last): File "build_openwebtext_pretraining_dataset.py", line 103, in main() File "build_openwebtext_pretraining_dataset.py", line 89, in main...

emirkin

Request: pypi package for ELECTRA

I request you to add a package configuration so that the ELECTRA repository can be easily installed and used. The purpose is to remove the need of cloning the repository...

mgroovyank

add code for continuing pre-training from an ELECTRA checkpoint

- add a variable `init_checkpoint` to `configure_pretraining.py` - add code for continuing pre-training from an ELECTRA checkpoint to `run_pretraining.py` - update README (instructions for continuing pre-training from an ELECTRA checkpoint...

tuvuumass

Could you share GLUE dev set results for BERT-small, ELECTRA-small and ELECTRA-small++?

I noticed there're only GLUE test set results for ELECTRA-small and ELECTRA-small++ in Table 8 and GLUE dev set overall result for BERT-small and ELECTRA-small in Table 1. Could you...

stevezheng23

Training loss

Hello, I was wondering whether it is possible to add some loss metrics to the training cycle? The only thing I see during training Electra model is `1275000/3000000 = 42.5%,...

DevKretov

Fix deprecated keyword argument in dropout layer.

The function signature for tf.nn.dropout in TF1 is: ```python tf.nn.dropout( x, keep_prob=None, noise_shape=None, seed=None, name=None, rate=None ) ``` while TF2 has: ```python tf.nn.dropout( x, rate, noise_shape=None, seed=None, name=None ) ```...

jarednielsen

ignore PAD during dynamic masking

avoid masking `["PAD"]` during dynamic masking. Issue #59

ccchang0111

electra
electra copied to clipboard

Metadata

max_predictions_per_seq and TPU training configuration

Metrics definition

Data loss: truncated record at 10035180

"Device or resource busy" for mounted paths

Request: pypi package for ELECTRA

add code for continuing pre-training from an ELECTRA checkpoint

Could you share GLUE dev set results for BERT-small, ELECTRA-small and ELECTRA-small++?

Training loss

Fix deprecated keyword argument in dropout layer.

ignore PAD during dynamic masking

← Metadata

Owner

Metadata

electra electra copied to clipboard

Metadata

← Metadata

Owner

Metadata

electra
electra copied to clipboard