zeshel icon indicating copy to clipboard operation
zeshel copied to clipboard

Invalid argument: Key: segment_ids. Can't parse serialized Example.

Open havocy28 opened this issue 5 years ago • 6 comments

When trying use run_classifier.sh I get the error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Key: segment_ids. Can't parse serialized Example. [[{{node ParseSingleExample/ParseSingleExample}}]] [[IteratorGetNext]] (1) Invalid argument: Key: segment_ids. Can't parse serialized Example. [[{{node ParseSingleExample/ParseSingleExample}}]] [[IteratorGetNext]] [[IteratorGetNext/_4055]]

I'm using reducing the sequence length and the batch size to attempt to fit into the 12GB of memory on the GPU I'm using with the following parameters:

python run_classifier.py
--do_train=true
--do_eval=false
--data_dir=$TFRecords
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$INIT
--max_seq_length=128
--train_batch_size=4
--learning_rate=2e-5
--num_train_epochs=3.0
--num_cands=64
--save_checkpoints_steps=6000
--output_dir=$EXPTS_DIR/$EXP_NAME
--use_tpu=$USE_TPU \

havocy28 avatar Oct 11 '19 23:10 havocy28

I have the same problem as @havocy28

Looking forward to your early reply.

Thanks, Best.

hitercs avatar Oct 23 '19 07:10 hitercs

I was able to overcome this error by reducing the maximum sequence length to 64 and making the following changes in the create_training_data.py file:

Modifying the mentions that exceed the maximum sequence length to be limited to the maximum sequence length and to only extend the prefix added to the mention if it is less than the maximum sequence length and only adding the suffix to the mention if there is room left over. However, in reducing the sequence length to 64 instead of 256, it still only runs a batch size of 1 as opposed to 8 on 12GB of Memory. I've attached the changes I made into this post. create_training_data.txt

Note: You also have to delete the existing tfrecords before running the create_training_data.sh again.

havocy28 avatar Oct 25 '19 05:10 havocy28

Hi @havocy28 ,

Thanks for your share. Best.

hitercs avatar Oct 25 '19 07:10 hitercs

It's probably worth mentioning that the performance was terrible in my evaluation:

I1026 08:16:01.098551 140515070199552 run_classifier.py:440] ***** Eval results ***** I1026 08:16:01.098709 140515070199552 run_classifier.py:442] eval_accuracy = 0.057 I1026 08:16:01.099158 140515070199552 run_classifier.py:442] eval_loss = 4.1595993 I1026 08:16:01.099335 140515070199552 run_classifier.py:442] global_step = 0 I1026 08:16:01.099476 140515070199552 run_classifier.py:442] loss = 4.1595993

I did not adjust the learning rate and there may be bugs in the modifications I made. If anyone finds reduced parameters that work, please share.

havocy28 avatar Oct 25 '19 22:10 havocy28

Hi @havocy28 , Can you share your run_classifier.py on the GPU? Thanks. Best.

JiangzuoQinglang avatar Mar 15 '21 08:03 JiangzuoQinglang

Hi @havocy28 , if i want to train this on the GPU, can you share your run_classifier.py? Thanks. Best.

WUZHIWEI2000 avatar Dec 07 '22 02:12 WUZHIWEI2000