bigbird
bigbird copied to clipboard
Error in run_classifier.py for attention_type=simulated_sparse
I am using script base_size.sh
to run the class run_classifier.py
. I am able to train and evaluate on imdb data for attention_type
set as original_full and block_sparse but when I set it to simulated_sparse I see errors in initializing the training itself. The 12 layers are initialized but training doesn't start. The major error log is below:
File "/home/amitghattimare/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3211, in _as_graph_def
graph.ParseFromString(compat.as_bytes(data))
google.protobuf.message.DecodeError: Error parsing message
I used the below script to run the code in case it helps in investigation. If I change attention_type to the other 2 options, it works fine. I am using only 8 cores because that's the max available in preemptible mode. I have reduced train_batch_size so that it fits in memory. I wonder if that's causing the issue though error logs don't indicate that.
python3 bigbird/classifier/run_classifier.py \
--data_dir=tfds://imdb_reviews/plain_text \
--output_dir=gs://bigbird-replication-bucket/classifier/imdb/sim_sparse_attention \
--attention_type=simulated_sparse \
--max_encoder_length=4096 \
--num_attention_heads=12 \
--num_hidden_layers=12 \
--hidden_size=768 \
--intermediate_size=3072 \
--block_size=64 \
--train_batch_size=1 \
--eval_batch_size=2 \
--do_train=True \
--do_eval=False \
--num_train_steps=1000 \
--use_tpu=True \
--tpu_name=bigbird \
--tpu_zone=us-central1-b \
--gcp_project=bigbird-replication \
--num_tpu_cores=8 \
--init_checkpoint=gs://bigbird-transformer/pretrain/bigbr_base/model.ckpt-0