Deep_dynamic_contextualized_word_representation
Deep_dynamic_contextualized_word_representation copied to clipboard
TensorFlow code and pre-trained models for A Dynamic Word Representation Model Based on Deep Context. It combines the idea of BERT model and ELMo's deep context word representation.
Deep dynamic Contextualized word representation (DDCWR)
TensorFlow code and pre-trained models for DDCWR
Important explanation
- The method of the model is simple, only using the feed forward neural network with attention mechanism.
- Model training is fast, and only a few cycles can be used to train the model. The value of the initialization parameter comes from the BERT model of Google.
- The effect of the model is very good. In most cases, it is consistent with the current (2018-11-13) optimal model. Sometimes the effect is better. The optimal effect can be seen in gluebenchmark.
Thought of article
This model Deep_dynamic_word_representation(DDWR) combines the BERT model and ELMo's deep context word representation.
The BERT comes from BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding The ELMo comes from Deep contextualized word representations
Basic usage method
Download Pre-trained models
Doenload GLUE dataDATA
using this script
Sentence (and sentence-pair) classification tasks
difference
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
python run_classifier_elmo.py \
--task_name=MRPC \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=/tmp/mrpc_output/
Prediction from classifier
the same as https://github.com/google-research/bert
export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
export TRAINED_CLASSIFIER=/path/to/fine/tuned/classifier
python run_classifier_elmo.py \
--task_name=MRPC \
--do_predict=true \
--data_dir=$GLUE_DIR/MRPC \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$TRAINED_CLASSIFIER \
--max_seq_length=128 \
--output_dir=/tmp/mrpc_output/
more methods to google-research/bert
Solve SQUAD1.1 problem
the same as https://github.com/google-research/bert
difference
python run_squad_elmo.py --vocab_file=$BERT_BASE_DIR/vocab.txt --bert_config_file=$BERT_BASE_DIR/bert_config.json --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt --do_train=True --train_file=$SQUAD_DIR/train-v1.1.json --do_predict=True --predict_file=$SQUAD_DIR/dev-v1.1.json --train_batch_size=12 --learning_rate=3e-5 --num_train_epochs=2.0 --max_seq_length=384 --doc_stride=128 --output_dir=./tmp/elmo_squad_base/
Experimental Result
python run_squad_elmo.py
{“exact_match”: 81.20151371807, “f1”: 88.56178500169332}