DeepCTR
DeepCTR copied to clipboard
DIN模型,run_din遇到的问题
Describe the question(问题描述) 直接运行run_din.py,运行epoch设置为10,运行了大概6次后,报以下错误。大概是IteratorResource does not exist,但是我不知道为啥会出现这个问题。可不可以请大佬指点一下。
`Train on 1 samples, validate on 2 samples Epoch 1/10 C:\Software\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " C:\Software\Anaconda3\lib\site-packages\tensorflow_core\python\framework\indexed_slices.py:424: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " 2019-11-15 17:44:39.292483: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-11-15 17:44:39.296103: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-11-15 17:44:39.306838: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] model_pruner failed: Invalid argument: MutableGraphView::MutableGraphView error: node 'model/attention_sequence_pooling_layer/local_activation_unit/concat' has self cycle fanin 'model/attention_sequence_pooling_layer/local_activation_unit/concat'. 2019-11-15 17:44:39.314760: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: MutableGraphView::MutableGraphView error: node 'model/attention_sequence_pooling_layer/local_activation_unit/concat' has self cycle fanin 'model/attention_sequence_pooling_layer/local_activation_unit/concat'. 2019-11-15 17:44:39.316846: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] arithmetic_optimizer failed: Invalid argument: The graph couldn't be sorted in topological order. 2019-11-15 17:44:39.321719: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-11-15 17:44:39.325654: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:697] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order. 2019-11-15 17:44:39.330328: W tensorflow/core/common_runtime/process_function_library_runtime.cc:675] Ignoring multi-device function optimization failure: Invalid argument: The graph couldn't be sorted in topological order.
1/1 [==============================] - 5s 5s/sample - loss: 0.7042 - binary_crossentropy: 0.7042 - val_loss: 0.6975 - val_binary_crossentropy: 0.6975 Epoch 2/10
1/1 [==============================] - 0s 33ms/sample - loss: 0.6956 - binary_crossentropy: 0.6956 - val_loss: 0.6961 - val_binary_crossentropy: 0.6961 Epoch 3/10
1/1 [==============================] - 0s 35ms/sample - loss: 0.6892 - binary_crossentropy: 0.6892 - val_loss: 0.6948 - val_binary_crossentropy: 0.6948 Epoch 4/10
1/1 [==============================] - 0s 43ms/sample - loss: 0.6836 - binary_crossentropy: 0.6836 - val_loss: 0.6938 - val_binary_crossentropy: 0.6938 Epoch 5/10
1/1 [==============================] - 0s 58ms/sample - loss: 0.6779 - binary_crossentropy: 0.6779 - val_loss: 0.6928 - val_binary_crossentropy: 0.6928 Epoch 6/10
1/1 [==============================] - 0s 61ms/sample - loss: 0.6730 - binary_crossentropy: 0.6730 - val_loss: 0.6920 - val_binary_crossentropy: 0.6920 Epoch 7/10 2019-11-15 17:44:39.675306: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at iterator_ops.cc:893 : Not found: Resource AnonymousIterator/AnonymousIterator7/class tensorflow::data::IteratorResource does not exist. 2019-11-15 17:44:39.676532: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Not found: Resource AnonymousIterator/AnonymousIterator7/class tensorflow::data::IteratorResource does not exist. [[{{node IteratorGetNext}}]]
Traceback (most recent call last):
1/1 [==============================] - 0s 68ms/sample - loss: 0.6679 - binary_crossentropy: 0.6679
File "C:/Workspace/python/recommend_system/DeepCTR/examples/run_din.py", line 39, in
Function call stack: distributed_function
HTTPSConnectionPool(host='pypi.python.org', port=443): Max retries exceeded with url: /pypi/deepctr/json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001E74A08F940>: Failed to establish a new connection: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。',))
Process finished with exit code 1 `
Operating environment(运行环境):
- python version [3.6]
- tensorflow version [2.0.0]
- deepctr version [0.6.3]
嗯,这个问题好奇怪的,以上错误是我在windows 上运行的结果,但是我放到linux服务器上运行就没有问题了。。。。 但以上错误发生的原因是啥呢。。。。
我在MAC 遇到 同样问题
但是 偶尔 可以跑通
我更新到了 tensorflow-2.1.0-rc0 可以跑, 并出结果, 但是 依旧会有 这个错误
刚才使用2.0跑除了这个问题,更新到了tf 2.1稳定版后就不报错了