BERT-BiLSTM-CRF-NER
BERT-BiLSTM-CRF-NER copied to clipboard
Found Inf or NaN global norm
总是会遇到Found Inf or NaN global norm
,要怎么办呢?
123 INFO:tensorflow:Saving checkpoints for 0 into ./output/result_dir/model.ckpt.
124 2019-04-01 11:26:15.232850: E tensorflow/core/kernels/check_numerics_op.cc:185] abnormal_detected_host @0x7f1ba962460 0 = {1, 0} Found Inf or NaN global norm.
125 INFO:tensorflow:Error recorded from training_loop: Found Inf or NaN global norm. : Tensor had NaN values
126 [[node VerifyFinite/CheckNumerics (defined at /disk1/hanyaqian/code/work15_bert_cpr/youdao_cpr/bert/optimization.p y:74) = CheckNumerics[T=DT_FLOAT, message="Found Inf or NaN global norm.", _device="/job:localhost/replica:0/task:0/ device:GPU:0"](global_norm/global_norm)]]
127
128 Caused by op u'VerifyFinite/CheckNumerics', defined at:
129 File "run_classifier_cpr.py", line 785, in <module>
130 tf.app.run()
131 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", li ne 125, in run
132 _sys.exit(main(argv))
133 File "run_classifier_cpr.py", line 712, in main
134 estimator.train(input_fn=train_input_fn, max_steps=next_checkpoint)
135 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_ estimator.py", line 2403, in train
136 saving_listeners=saving_listeners
137 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/python/estimator/estimator. py", line 354, in train
138 loss = self._train_model(input_fn, hooks, saving_listeners)
139 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/python/estimator/estimator. py", line 1207, in _train_model
140 return self._train_model_default(input_fn, hooks, saving_listeners)
141 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/python/estimator/estimator. py", line 1237, in _train_model_default
142 features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
143 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/contrib/tpu/python/tpu/tpu_ estimator.py", line 2195, in _call_model_fn
144 features, labels, mode, config)
145 File "/disk1/hanyaqian/code/work15_bert_cpr/venv/lib/python2.7/site-packages/tensorflow/python/estimator/estimator. py", line 1195, in _call_model_fn
146 model_fn_results = self._model_fn(features=features, **kwargs)
NOR
您这个应该不是直接运行的我的代码吧,改动的地方也不清楚。没办法看出来是什么问题。
one more thing,2.7环境没测试过。
我也碰到了同样的问题(虽然不是同一个程序),我正在用tfdbg调试,能帮助查找程序中出现的nan值,后来发现是自己之前没注意的一个地方存在0除以0导致了nan的出现。希望对你有帮助。
我在tensorflow1.9版本运行正常,但是在tensorflow1.13版本运行,一直显示Found Inf or NaN global norm,除了更改了文件路径,其他代码并未做更改,好奇怪??????显示(gras,_)=tf.clip_by_global_norm(grads,clip=1.0)这行错误,我调整了learning_rate还是不行
我在tensorflow1.9版本运行正常,但是在tensorflow1.13版本运行,一直显示Found Inf or NaN global norm,除了更改了文件路径,其他代码并未做更改,好奇怪??????显示(gras,_)=tf.clip_by_global_norm(grads,clip=1.0)这行错误,我调整了learning_rate还是不行
请问您解决这个问题了吗?我在做其他任务的时候也遇到因为更换tf版本导致在这部出现了nan
我也没解决,这个是tensorflow版本问题导致的,建议换成pytorch版本,这个兼容性好点,希望对你有帮助
------------------ 原始邮件 ------------------ 发件人: "hqWu"<[email protected]>; 发送时间: 2020年4月23日(星期四) 下午5:33 收件人: "macanv/BERT-BiLSTM-CRF-NER"<[email protected]>; 抄送: "安静倾诉馨雨"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [macanv/BERT-BiLSTM-CRF-NER] Found Inf or NaN global norm (#100)
我在tensorflow1.9版本运行正常,但是在tensorflow1.13版本运行,一直显示Found Inf or NaN global norm,除了更改了文件路径,其他代码并未做更改,好奇怪??????显示(gras,_)=tf.clip_by_global_norm(grads,clip=1.0)这行错误,我调整了learning_rate还是不行
请问您解决这个问题了吗?我在做其他任务的时候也遇到因为更换tf版本导致在这部出现了nan
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
我也没解决,这个是tensorflow版本问题导致的,建议换成pytorch版本,这个兼容性好点,希望对你有帮助 … ------------------ 原始邮件 ------------------ 发件人: "hqWu"<[email protected]>; 发送时间: 2020年4月23日(星期四) 下午5:33 收件人: "macanv/BERT-BiLSTM-CRF-NER"<[email protected]>; 抄送: "安静倾诉馨雨"<[email protected]>;"Comment"<[email protected]>; 主题: Re: [macanv/BERT-BiLSTM-CRF-NER] Found Inf or NaN global norm (#100) 我在tensorflow1.9版本运行正常,但是在tensorflow1.13版本运行,一直显示Found Inf or NaN global norm,除了更改了文件路径,其他代码并未做更改,好奇怪??????显示(gras,_)=tf.clip_by_global_norm(grads,clip=1.0)这行错误,我调整了learning_rate还是不行 请问您解决这个问题了吗?我在做其他任务的时候也遇到因为更换tf版本导致在这部出现了nan — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
收到,多谢