returnn
returnn copied to clipboard
Problem with loss on an output layer in a subnet, which is in a recursive layer
Defining a loss on an output layer in a subnet, which itself is in a recursive net, throws an error. See config and error log below.
@albertz Points discussed in Slack:
- in
_SubnetworkRecCell.get_output
, there is some logic to add fill targets with data. see for key insorted(used_keys)
and related code - maybe it was not marked as
used_keys
- when setting 'target':
'source'
in the output layer, it works, becausesource
has a valid placeholder when the layer is initialized, butclasses
not - strangely, I could not reproduce this bug in a test case.
#!rnn.py
import tensorflow as tf
from returnn.tf.util.basic import DimensionTag
dec_time = DimensionTag(kind=DimensionTag.Types.Spatial, description="combined_time")
adam = True
batch_size = 4000
batching = 'random'
calculate_exp_loss = True
debug_add_check_numerics_on_output = True
debug_mode = False
debug_print_layer_output_template = True
dev = { 'class': 'TranslationDataset',
'file_postfix': 'dev',
'partition_epoch': 1,
'path': '/u/hoffbauer/code/ner-configs/work/i6_nlu/ner/corpus/ConvertCoNLLToRETURNNFormat.ndGYCbAixETO/output/corpus',
'seq_ordering': 'sorted',
'source_postfix': '',
'target_postfix': '',
'unknown_label': '<UNK>'}
device = 'gpu'
eval = { 'class': 'TranslationDataset',
'file_postfix': 'test',
'partition_epoch': 1,
'path': '/u/hoffbauer/code/ner-configs/work/i6_nlu/ner/corpus/ConvertCoNLLToRETURNNFormat.ndGYCbAixETO/output/corpus',
'seq_ordering': 'sorted',
'source_postfix': '',
'target_postfix': '',
'unknown_label': '<UNK>'}
extern_data = {
'classes': {
'available_for_inference': True,
'dim': 20,
'same_dim_tags_as': {'t': dec_time},
'sparse': True,
"batch_dim_axis": 0,
"time_dim_axis": 1,
}
}
learning_rate = 0.001
learning_rate_file = 'learning_rates'
log = ['./returnn.log']
log_batch_size = True
log_verbosity = 5
model = '/u/hoffbauer/code/ner-configs/work/i6_core/returnn/training/ReturnnTrainingJob.RqMUerUJyVTv/output/models/epoch'
multiprocessing = True
network = {
'output': {
'class': 'rec',
'from': ['data:classes'],
'unit': {
'output': {
'class': 'subnetwork',
'from': ['prev:tag_embedding'],
'subnetwork': {
'hidden': {
'class': 'rnn_cell',
'n_out': 64,
'unit': 'LSTMBlock'
},
'output': {
'class': 'linear',
'from': ['hidden'],
'n_out': 20,
'target': 'classes',
'activation': 'softmax',
'loss': 'ce',
}
},
},
# 'tag': {'class': 'copy', 'initial_output': 9},
'tag_embedding': {
'activation': None,
'class': 'linear',
'from': ['data:source'],
'n_out': 64,
'with_bias': True
},
},
},
}
num_epochs = 50
optimize_move_layers_out = False
optimizer_epsilon = 1e-06
random_seed = 0
save_interval = 1
task = 'train'
tf_log_memory_usage = True
train = { 'class': 'TranslationDataset',
'file_postfix': 'train',
'partition_epoch': 1,
'path': '/u/hoffbauer/code/ner-configs/work/i6_nlu/ner/corpus/ConvertCoNLLToRETURNNFormat.ndGYCbAixETO/output/corpus',
'seq_ordering': 'random',
'source_postfix': '',
'target_postfix': '',
'unknown_label': '<UNK>'}
update_on_device = True
use_tensorflow = True
config = {}
locals().update(**config)
ssh://hoffbauer@localhost:12345/work/smt4/thulke/hoffbauer/venv2/bin/python -u /u/hoffbauer/code/returnn-nlu-fork/rnn.py /u/hoffbauer/code/ner-configs/tmp/generated_configs/config_tag_lm.py
2021-07-04 12:43:21.539747: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
RETURNN starting up, version 1.0.0+unknown, date/time 2021-07-04-12-43-24 (UTC+0200), pid 4684, cwd /u/hoffbauer/code/debug_logs, Python /work/smt4/thulke/hoffbauer/venv2/bin/python
RETURNN command line options: ['/u/hoffbauer/code/ner-configs/tmp/generated_configs/config_tag_lm.py']
Hostname: cluster-cn-211
TensorFlow: 2.3.0 (v2.3.0-rc2-23-gb36436b087) (<site-package> in /work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
2021-07-04 12:43:24.629362: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2600090000 Hz
2021-07-04 12:43:24.629898: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3c996e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-04 12:43:24.629983: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-07-04 12:43:24.641531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-07-04 12:43:25.287954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 12:43:25.288014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
2021-07-04 12:43:25.364327: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d77c50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-04 12:43:25.364434: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 980, Compute Capability 5.2
2021-07-04 12:43:25.366698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:41:00.0 name: GeForce GTX 980 computeCapability: 5.2
coreClock: 1.266GHz coreCount: 16 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 208.91GiB/s
2021-07-04 12:43:25.366796: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-04 12:43:25.372801: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-04 12:43:25.377816: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-04 12:43:25.378475: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-04 12:43:25.382724: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-04 12:43:25.386133: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-04 12:43:25.394519: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-07-04 12:43:25.396948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-04 12:43:25.397105: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-04 12:43:26.648332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 12:43:26.648396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-07-04 12:43:26.648410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-07-04 12:43:26.650641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 3552 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:41:00.0, compute capability: 5.2)
Local devices available to TensorFlow:
1/4: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1320100640957602680
2/4: name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 16146905449886550846
physical_device_desc: "device: XLA_CPU device"
3/4: name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 7133626082724591982
physical_device_desc: "device: XLA_GPU device"
4/4: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3724541952
locality {
bus_id: 3
numa_node: 2
links {
}
}
incarnation: 16342895949960684598
physical_device_desc: "device: 0, name: GeForce GTX 980, pci bus id: 0000:41:00.0, compute capability: 5.2"
Using gpu device 0: GeForce GTX 980
<TranslationDataset 'dev' epoch=1>: waiting for data length info...
<TranslationDataset 'eval' epoch=1>: waiting for data length info...
<TranslationDataset 'train' epoch=1>: waiting for data length info...
Train data:
input: 22960 x 1
output: {'data': [22960, 1], 'classes': [20, 1]}
TranslationDataset, sequences: 14041, frames: unknown
Dev data:
TranslationDataset, sequences: 3250, frames: unknown
Eval data:
TranslationDataset, sequences: 3453, frames: unknown
Learning-rate-control: loading file learning_rates
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2021-07-04 12:43:30.319245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:41:00.0 name: GeForce GTX 980 computeCapability: 5.2
coreClock: 1.266GHz coreCount: 16 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 208.91GiB/s
2021-07-04 12:43:30.319330: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-04 12:43:30.319383: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-04 12:43:30.319419: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-04 12:43:30.319454: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-04 12:43:30.319488: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-04 12:43:30.319521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-04 12:43:30.319560: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-07-04 12:43:30.321460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-04 12:43:30.321520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 12:43:30.321532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-07-04 12:43:30.321543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-07-04 12:43:30.323463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3552 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:41:00.0, compute capability: 5.2)
WARNING:tensorflow:From /u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py:429: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer root/'data:classes' output: Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B,T|'time:var:extern_data:classes'])
layer root/'output' output: Data(name='output_output', batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])
WARNING:tensorflow:From /u/hoffbauer/code/returnn-nlu-fork/returnn/tf/util/basic.py:1285: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py:3790: LSTMCell.__init__ (from tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
layer root/output(rec-subnet)/output(subnet)/'hidden' output: Data(name='hidden_output', batch_shape_meta=[B,F|64])
WARNING:tensorflow:From /work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:962: Layer.add_variable (from tensorflow.python.keras.engine.base_layer_v1) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
debug_add_check_numerics_on_output: add for layer 'hidden': <tf.Tensor 'output/rec/output/hidden/rec/lstm_cell/mul_2:0' shape=(?, 64) dtype=float32>
layer root/output(rec-subnet)/'data:classes' output: Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])
Exception creating layer root/output(rec-subnet)/'data:classes' of class SourceLayer with opts:
{'_name': 'data:classes',
'_network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'data_key': 'classes',
'name': 'data:classes',
'network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'output': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B]),
'sources': []}
Exception occurred during in-loop construction of layer 'data:classes'.
Exception occurred during in-loop construction of layer 'output/output'.
Exception occurred during in-loop construction of layer 'output'.
Exception creating layer root/'output' of class RecLayer with opts:
{'_name': 'output',
'_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'n_out': <class 'returnn.util.basic.NotSpecified'>,
'name': 'output',
'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'output': Data(name='output_output', batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20]),
'sources': [<SourceLayer 'data:classes' out_type=Data(dtype='int32', sparse=True, dim=20, batch_shape_meta=[B,T|'time:var:extern_data:classes'])>],
'unit': {'output': {'class': 'subnetwork',
'from': ['prev:tag_embedding'],
'subnetwork': {'hidden': {'class': 'rnn_cell',
'n_out': 64,
'unit': 'LSTMBlock'},
'output': {'activation': 'softmax',
'class': 'linear',
'from': ['hidden'],
'loss': 'ce',
'n_out': 20,
'target': 'classes'}}},
'tag_embedding': {'activation': None,
'class': 'linear',
'from': ['data:source'],
'n_out': 64,
'with_bias': True}}}
EXCEPTION
Traceback (most recent call last):
File "/u/hoffbauer/code/returnn-nlu-fork/rnn.py", line 11, in <module>
line: main()
locals:
main = <local> <function main at 0x7f0e45ad8940>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/__main__.py", line 653,...</local></module>
The test case has optimize_move_layers_out = False
which you should never use.
The log is incomplete. Here the complete log from Slack:
ssh://hoffbauer@localhost:12345/work/smt4/thulke/hoffbauer/venv2/bin/python -u /u/hoffbauer/code/returnn-nlu-fork/rnn.py /u/hoffbauer/code/ner-configs/tmp/generated_configs/config_tag_lm.py
2021-07-04 12:43:21.539747: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
RETURNN starting up, version 1.0.0+unknown, date/time 2021-07-04-12-43-24 (UTC+0200), pid 4684, cwd /u/hoffbauer/code/debug_logs, Python /work/smt4/thulke/hoffbauer/venv2/bin/python
RETURNN command line options: ['/u/hoffbauer/code/ner-configs/tmp/generated_configs/config_tag_lm.py']
Hostname: cluster-cn-211
TensorFlow: 2.3.0 (v2.3.0-rc2-23-gb36436b087) (<site-package> in /work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
2021-07-04 12:43:24.629362: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2600090000 Hz
2021-07-04 12:43:24.629898: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3c996e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-04 12:43:24.629983: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-07-04 12:43:24.641531: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-07-04 12:43:25.287954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 12:43:25.288014: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]
CUDA_VISIBLE_DEVICES is set to '0'.
Collecting TensorFlow device list...
2021-07-04 12:43:25.364327: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d77c50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-04 12:43:25.364434: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 980, Compute Capability 5.2
2021-07-04 12:43:25.366698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:41:00.0 name: GeForce GTX 980 computeCapability: 5.2
coreClock: 1.266GHz coreCount: 16 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 208.91GiB/s
2021-07-04 12:43:25.366796: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-04 12:43:25.372801: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-04 12:43:25.377816: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-04 12:43:25.378475: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-04 12:43:25.382724: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-04 12:43:25.386133: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-04 12:43:25.394519: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-07-04 12:43:25.396948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-04 12:43:25.397105: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-04 12:43:26.648332: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 12:43:26.648396: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-07-04 12:43:26.648410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-07-04 12:43:26.650641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 3552 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:41:00.0, compute capability: 5.2)
Local devices available to TensorFlow:
1/4: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1320100640957602680
2/4: name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 16146905449886550846
physical_device_desc: "device: XLA_CPU device"
3/4: name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 7133626082724591982
physical_device_desc: "device: XLA_GPU device"
4/4: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 3724541952
locality {
bus_id: 3
numa_node: 2
links {
}
}
incarnation: 16342895949960684598
physical_device_desc: "device: 0, name: GeForce GTX 980, pci bus id: 0000:41:00.0, compute capability: 5.2"
Using gpu device 0: GeForce GTX 980
<TranslationDataset 'dev' epoch=1>: waiting for data length info...
<TranslationDataset 'eval' epoch=1>: waiting for data length info...
<TranslationDataset 'train' epoch=1>: waiting for data length info...
Train data:
input: 22960 x 1
output: {'data': [22960, 1], 'classes': [20, 1]}
TranslationDataset, sequences: 14041, frames: unknown
Dev data:
TranslationDataset, sequences: 3250, frames: unknown
Eval data:
TranslationDataset, sequences: 3453, frames: unknown
Learning-rate-control: loading file learning_rates
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2021-07-04 12:43:30.319245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:41:00.0 name: GeForce GTX 980 computeCapability: 5.2
coreClock: 1.266GHz coreCount: 16 deviceMemorySize: 3.95GiB deviceMemoryBandwidth: 208.91GiB/s
2021-07-04 12:43:30.319330: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-07-04 12:43:30.319383: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-07-04 12:43:30.319419: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-07-04 12:43:30.319454: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-07-04 12:43:30.319488: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-07-04 12:43:30.319521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-07-04 12:43:30.319560: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-07-04 12:43:30.321460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-04 12:43:30.321520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-04 12:43:30.321532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-07-04 12:43:30.321543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-07-04 12:43:30.323463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3552 MB memory) -> physical GPU (device: 0, name: GeForce GTX 980, pci bus id: 0000:41:00.0, compute capability: 5.2)
WARNING:tensorflow:From /u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py:429: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer root/'data:classes' output: Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B,T|'time:var:extern_data:classes'])
layer root/'output' output: Data(name='output_output', batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])
WARNING:tensorflow:From /u/hoffbauer/code/returnn-nlu-fork/returnn/tf/util/basic.py:1285: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
WARNING:tensorflow:From /u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py:3790: LSTMCell.__init__ (from tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
layer root/output(rec-subnet)/output(subnet)/'hidden' output: Data(name='hidden_output', batch_shape_meta=[B,F|64])
WARNING:tensorflow:From /work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:962: Layer.add_variable (from tensorflow.python.keras.engine.base_layer_v1) is deprecated and will be removed in a future version.
Instructions for updating:
Please use `layer.add_weight` method instead.
debug_add_check_numerics_on_output: add for layer 'hidden': <tf.Tensor 'output/rec/output/hidden/rec/lstm_cell/mul_2:0' shape=(?, 64) dtype=float32>
layer root/output(rec-subnet)/'data:classes' output: Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])
Exception creating layer root/output(rec-subnet)/'data:classes' of class SourceLayer with opts:
{'_name': 'data:classes',
'_network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'data_key': 'classes',
'name': 'data:classes',
'network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'output': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B]),
'sources': []}
Exception occurred during in-loop construction of layer 'data:classes'.
Exception occurred during in-loop construction of layer 'output/output'.
Exception occurred during in-loop construction of layer 'output'.
Exception creating layer root/'output' of class RecLayer with opts:
{'_name': 'output',
'_network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'n_out': <class 'returnn.util.basic.NotSpecified'>,
'name': 'output',
'network': <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>,
'output': Data(name='output_output', batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20]),
'sources': [<SourceLayer 'data:classes' out_type=Data(dtype='int32', sparse=True, dim=20, batch_shape_meta=[B,T|'time:var:extern_data:classes'])>],
'unit': {'output': {'class': 'subnetwork',
'from': ['prev:tag_embedding'],
'subnetwork': {'hidden': {'class': 'rnn_cell',
'n_out': 64,
'unit': 'LSTMBlock'},
'output': {'activation': 'softmax',
'class': 'linear',
'from': ['hidden'],
'loss': 'ce',
'n_out': 20,
'target': 'classes'}}},
'tag_embedding': {'activation': None,
'class': 'linear',
'from': ['data:source'],
'n_out': 64,
'with_bias': True}}}
EXCEPTION
Traceback (most recent call last):
File "/u/hoffbauer/code/returnn-nlu-fork/rnn.py", line 11, in <module>
line: main()
locals:
main = <local> <function main at 0x7f0e45ad8940>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/__main__.py", line 653, in main
line: execute_main_task()
locals:
execute_main_task = <global> <function execute_main_task at 0x7f0e45ad8820>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/__main__.py", line 451, in execute_main_task
line: engine.init_train_from_config(config, train_data, dev_data, eval_data)
locals:
engine = <global> <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>
engine.init_train_from_config = <global> <bound method Engine.init_train_from_config of <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>>
config = <global> <returnn.config.Config object at 0x7f0e54979d30>
train_data = <global> <TranslationDataset 'train' epoch=1>
dev_data = <global> <TranslationDataset 'dev' epoch=1>
eval_data = <global> <TranslationDataset 'eval' epoch=1>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/engine.py", line 1029, in Engine.init_train_from_config
line: self.init_network_from_config(config)
locals:
self = <local> <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>
self.init_network_from_config = <local> <bound method Engine.init_network_from_config of <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>>
config = <local> <returnn.config.Config object at 0x7f0e54979d30>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/engine.py", line 1094, in Engine.init_network_from_config
line: self._init_network(net_desc=net_dict, epoch=self.epoch)
locals:
self = <local> <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>
self._init_network = <local> <bound method Engine._init_network of <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>>
net_desc = <not found>
net_dict = <local> {'output': {'class': 'rec', 'from': ['data:classes'], 'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'a...
epoch = <local> None
self.epoch = <local> 1
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/engine.py", line 1273, in Engine._init_network
line: self.network, self.updater = self.create_network(
config=self.config,
extern_data=extern_data,
rnd_seed=net_random_seed,
train_flag=train_flag, eval_flag=self.use_eval_flag, search_flag=self.use_search_flag,
initial_learning_rate=getattr(self, "initial_learning_rate", None),
net_dict=net_desc)
locals:
self = <local> <returnn.tf.engine.Engine object at 0x7f0e0f5e52e0>
self.network = <local> None
self.updater = <local> None
self.create_network = <local> <bound method Engine.create_network of <class 'returnn.tf.engine.Engine'>>
config = <not found>
self.config = <local> <returnn.config.Config object at 0x7f0e54979d30>
extern_data = <local> <ExternData data={'classes': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B,T|'time:var:extern_data:classes'])}>
rnd_seed = <not found>
net_random_seed = <local> 1
train_flag = <local> <tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>
eval_flag = <not found>
self.use_eval_flag = <local> True
search_flag = <not found>
self.use_search_flag = <local> False
initial_learning_rate = <not found>
getattr = <builtin> <built-in function getattr>
net_dict = <not found>
net_desc = <local> {'output': {'class': 'rec', 'from': ['data:classes'], 'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'a...
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/engine.py", line 1314, in Engine.create_network
line: network.construct_from_dict(net_dict)
locals:
network = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
network.construct_from_dict = <local> <bound method TFNetwork.construct_from_dict of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
net_dict = <local> {'output': {'class': 'rec', 'from': ['data:classes'], 'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'a...
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 558, in TFNetwork.construct_from_dict
line: self.construct_layer(net_dict, name, get_layer=get_layer)
locals:
self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
net_dict = <local> {'output': {'class': 'rec', 'from': ['data:classes'], 'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'a...
name = <local> 'output', len = 6
get_layer = <local> None
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 882, in TFNetwork.construct_layer
line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
locals:
add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
name = <local> 'output', len = 6
name_with_prefix = <local> 'output', len = 6
layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
layer_desc = <local> {'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}}, 'tag_embeddin...
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 1037, in TFNetwork.add_layer
line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
locals:
layer = <not found>
self = <local> <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root' train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
name = <local> 'output', len = 6
layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
layer_desc = <local> {'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}}, 'tag_embeddin...
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 959, in TFNetwork._create_layer
line: layer = layer_class(**layer_desc)
locals:
layer = <not found>
layer_class = <local> <class 'returnn.tf.layers.rec.RecLayer'>
layer_desc = <local> {'unit': {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}}, 'tag_embeddin..., len = 8
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 232, in RecLayer.__init__
line: y = self._get_output_subnet_unit(self.cell)
locals:
y = <not found>
self = <local> <RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])>
self._get_output_subnet_unit = <local> <bound method RecLayer._get_output_subnet_unit of <RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])>>
self.cell = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 906, in RecLayer._get_output_subnet_unit
line: output = cell.get_output(rec_layer=self)
locals:
output = <not found>
cell = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
cell.get_output = <local> <bound method _SubnetworkRecCell.get_output of <_SubnetworkRecCell 'root/output(rec-subnet)'>>
rec_layer = <not found>
self = <local> <RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 2422, in _SubnetworkRecCell.get_output
line: final_loop_vars = self._while_loop(
cond=cond,
body=body,
loop_vars=init_loop_vars,
shape_invariants=shape_invariants)
locals:
final_loop_vars = <not found>
self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
self._while_loop = <local> <bound method _SubnetworkRecCell._while_loop of <_SubnetworkRecCell 'root/output(rec-subnet)'>>
cond = <local> <function _SubnetworkRecCell.get_output.<locals>.cond at 0x7f0d7a2b3670>
body = <local> <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>
loop_vars = <not found>
init_loop_vars = <local> (<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_zeros:0' shape=(?, 64) dtype=float32>], [[LSTMStateTuple(c=<tf.Tensor 'output/rec/output/hidden/rec_initial_state/zeros:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/o...
shape_invariants = <local> (TensorShape([]), ([TensorShape([Dimension(None), Dimension(64)])], [[LSTMStateTuple(c=TensorShape([Dimension(None), Dimension(64)]), h=TensorShape([Dimension(None), Dimension(64)]))]]), [TensorShape(None), TensorShape(None), TensorShape(None)]), _[0]: {len = 0}
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 1771, in _SubnetworkRecCell._while_loop
line: return tf.while_loop(
cond=cond,
body=body,
loop_vars=loop_vars,
shape_invariants=shape_invariants,
back_prop=self.parent_rec_layer.back_prop)
locals:
tf = <global> <module 'tensorflow' from '/work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/__init__.py'>
tf.while_loop = <global> <function while_loop_v2 at 0x7f0e2275bb80>
cond = <local> <function _SubnetworkRecCell.get_output.<locals>.cond at 0x7f0d7a2b3670>
body = <local> <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>
loop_vars = <local> (<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_zeros:0' shape=(?, 64) dtype=float32>], [[LSTMStateTuple(c=<tf.Tensor 'output/rec/output/hidden/rec_initial_state/zeros:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/o...
shape_invariants = <local> (TensorShape([]), ([TensorShape([Dimension(None), Dimension(64)])], [[LSTMStateTuple(c=TensorShape([Dimension(None), Dimension(64)]), h=TensorShape([Dimension(None), Dimension(64)]))]]), [TensorShape(None), TensorShape(None), TensorShape(None)]), _[0]: {len = 0}
back_prop = <not found>
self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
self.parent_rec_layer = <local> <RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])>
self.parent_rec_layer.back_prop = <local> True
File "/work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/util/deprecation.py", line 574, in while_loop_v2
line: return func(*args, **kwargs)
locals:
func = <local> <function while_loop_v2 at 0x7f0e2275baf0>
args = <local> ()
kwargs = <local> {'cond': <function _SubnetworkRecCell.get_output.<locals>.cond at 0x7f0d7a2b3670>, 'body': <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>, 'loop_vars': (<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_z...
File "/work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2489, in while_loop_v2
line: return while_loop(
cond=cond,
body=body,
loop_vars=loop_vars,
shape_invariants=shape_invariants,
parallel_iterations=parallel_iterations,
back_prop=back_prop,
swap_memory=swap_memory,
name=name,
maximum_iterations=maximum_iterations,
return_same_structure=True)
locals:
while_loop = <global> <function while_loop at 0x7f0e2275a9d0>
cond = <local> <function _SubnetworkRecCell.get_output.<locals>.cond at 0x7f0d7a2b3670>
body = <local> <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>
loop_vars = <local> (<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_zeros:0' shape=(?, 64) dtype=float32>], [[LSTMStateTuple(c=<tf.Tensor 'output/rec/output/hidden/rec_initial_state/zeros:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/o...
shape_invariants = <local> (TensorShape([]), ([TensorShape([Dimension(None), Dimension(64)])], [[LSTMStateTuple(c=TensorShape([Dimension(None), Dimension(64)]), h=TensorShape([Dimension(None), Dimension(64)]))]]), [TensorShape(None), TensorShape(None), TensorShape(None)]), _[0]: {len = 0}
parallel_iterations = <local> 10
back_prop = <local> True
swap_memory = <local> False
name = <local> None
maximum_iterations = <local> None
return_same_structure = <not found>
File "/work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2773, in while_loop
line: result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants,
return_same_structure)
locals:
result = <not found>
loop_context = <local> <tensorflow.python.ops.control_flow_ops.WhileContext object at 0x7f0d7a2bb1f0>
loop_context.BuildLoop = <local> <bound method WhileContext.BuildLoop of <tensorflow.python.ops.control_flow_ops.WhileContext object at 0x7f0d7a2bb1f0>>
cond = <local> <function _SubnetworkRecCell.get_output.<locals>.cond at 0x7f0d7a2b3670>
body = <local> <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>
loop_vars = <local> (<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_zeros:0' shape=(?, 64) dtype=float32>], [[LSTMStateTuple(c=<tf.Tensor 'output/rec/output/hidden/rec_initial_state/zeros:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/o...
shape_invariants = <local> (TensorShape([]), ([TensorShape([Dimension(None), Dimension(64)])], [[LSTMStateTuple(c=TensorShape([Dimension(None), Dimension(64)]), h=TensorShape([Dimension(None), Dimension(64)]))]]), [TensorShape(None), TensorShape(None), TensorShape(None)]), _[0]: {len = 0}
return_same_structure = <local> True
File "/work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2255, in WhileContext.BuildLoop
line: original_body_result, exit_vars = self._BuildLoop(
pred, body, original_loop_vars, loop_vars, shape_invariants)
locals:
original_body_result = <not found>
exit_vars = <not found>
self = <local> <tensorflow.python.ops.control_flow_ops.WhileContext object at 0x7f0d7a2bb1f0>
self._BuildLoop = <local> <bound method WhileContext._BuildLoop of <tensorflow.python.ops.control_flow_ops.WhileContext object at 0x7f0d7a2bb1f0>>
pred = <local> <function _SubnetworkRecCell.get_output.<locals>.cond at 0x7f0d7a2b3670>
body = <local> <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>
original_loop_vars = <local> (<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_zeros:0' shape=(?, 64) dtype=float32>], [[LSTMStateTuple(c=<tf.Tensor 'output/rec/output/hidden/rec_initial_state/zeros:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/o...
loop_vars = <local> [<tf.Tensor 'output/rec/initial_i:0' shape=() dtype=int32>, <tf.Tensor 'output/rec/tag_embedding/init_tag_embedding_zeros:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'output/rec/output/hidden/rec_initial_state/zeros:0' shape=(?, 64) dtype=float32>, <tf.Tensor 'output/rec/output/hidden/rec_initial..., len = 7
shape_invariants = <local> (TensorShape([]), ([TensorShape([Dimension(None), Dimension(64)])], [[LSTMStateTuple(c=TensorShape([Dimension(None), Dimension(64)]), h=TensorShape([Dimension(None), Dimension(64)]))]]), [TensorShape(None), TensorShape(None), TensorShape(None)]), _[0]: {len = 0}
File "/work/smt4/thulke/hoffbauer/venv2/lib/python3.8/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2181, in WhileContext._BuildLoop
line: body_result = body(*packed_vars_for_body)
locals:
body_result = <not found>
body = <local> <function _SubnetworkRecCell.get_output.<locals>.body at 0x7f0d7a2b35e0>
packed_vars_for_body = <local> (<tf.Tensor 'output/rec/while/Identity:0' shape=() dtype=int32>, ([<tf.Tensor 'output/rec/while/Identity_1:0' shape=(?, 64) dtype=float32>], [[LSTMStateTuple(c=<tf.Tensor 'output/rec/while/Identity_2:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/while/Identity_3:0' shape=(?, 64) dtype...
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 2264, in _SubnetworkRecCell.get_output.<locals>.body
line: self._construct(
prev_outputs=prev_outputs, prev_extra=prev_extra,
i=i,
data=data_,
inputs_moved_out_tas=input_layers_moved_out_tas,
needed_outputs=needed_outputs)
locals:
self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
self._construct = <local> <bound method _SubnetworkRecCell._construct of <_SubnetworkRecCell 'root/output(rec-subnet)'>>
prev_outputs = <local> {'tag_embedding': <tf.Tensor 'output/rec/while_loop_body/prev_outputs/identity_tag_embedding:0' shape=(?, 64) dtype=float32>}
prev_extra = <local> {'output/hidden': {'state': LSTMStateTuple(c=<tf.Tensor 'output/rec/while_loop_body/prev_extra/identity_output/hidden_state_0:0' shape=(?, 64) dtype=float32>, h=<tf.Tensor 'output/rec/while_loop_body/prev_extra/identity_output/hidden_state_1:0' shape=(?, 64) dtype=float32>)}}
i = <local> <tf.Tensor 'output/rec/while/Identity:0' shape=() dtype=int32>
data = <not found>
data_ = <local> {'source': <tf.Tensor 'output/rec/while_loop_body/source_ta_read:0' shape=(?,) dtype=int32>}
inputs_moved_out_tas = <not found>
input_layers_moved_out_tas = <local> {}
needed_outputs = <local> {'output/output', 'output'}, len = 2
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 1546, in _SubnetworkRecCell._construct
line: get_layer(layer_name)
locals:
get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
layer_name = <local> 'output', len = 6
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 1522, in _SubnetworkRecCell._construct.<locals>.get_layer
line: layer = self.net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
locals:
layer = <not found>
self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
self.net = <local> <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self.net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
self.net_dict = <local> {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}}, 'tag_embedding': {'act...
name = <local> 'output', len = 6
get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 875, in TFNetwork.construct_layer
line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
locals:
layer_class = <local> <class 'returnn.tf.layers.basic.SubnetworkLayer'>
layer_class.transform_config_dict = <local> <bound method SubnetworkLayer.transform_config_dict of <class 'returnn.tf.layers.basic.SubnetworkLayer'>>
layer_desc = <local> {'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}, '_network': <TFNetwork 'root/output(rec-subnet)' parent_l...
network = <not found>
net = <local> <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/basic.py", line 6595, in SubnetworkLayer.transform_config_dict
line: d["_output"] = subnet.construct_layer("output", parent_get_layer=get_layer)
locals:
d = <local> {'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}, '_network': <TFNetwork 'root/output(rec-subnet)' parent_l...
subnet = <local> Subnetwork{root/output(rec-subnet)/output(subnet)}
subnet.construct_layer = <local> <bound method Subnetwork.construct_layer of Subnetwork{root/output(rec-subnet)/output(subnet)}>
parent_get_layer = <not found>
get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2623, in Subnetwork.construct_layer
line: return self.get_sub_layer_func(parent_get_layer)(name)
locals:
self = <local> Subnetwork{root/output(rec-subnet)/output(subnet)}
self.get_sub_layer_func = <local> <bound method Subnetwork.get_sub_layer_func of Subnetwork{root/output(rec-subnet)/output(subnet)}>
parent_get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
name = <local> 'output', len = 6
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2609, in Subnetwork.get_layer_func.<locals>.wrapped_get_layer
line: return get_layer(name)
locals:
get_layer = <local> <function Subnetwork.get_sub_layer_func.<locals>.wrapped_get_layer at 0x7f0d7a2e5d30>
name = <local> 'output', len = 6
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2590, in Subnetwork.get_sub_layer_func.<locals>.wrapped_get_layer
line: return base_get_layer(self.name_in_parent + "/" + name)
locals:
base_get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
self = <local> Subnetwork{root/output(rec-subnet)/output(subnet)}
self.name_in_parent = <local> 'output', len = 6
name = <local> 'output', len = 6
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 1522, in _SubnetworkRecCell._construct.<locals>.get_layer
line: layer = self.net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
locals:
layer = <not found>
self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
self.net = <local> <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self.net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
self.net_dict = <local> {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}}, 'tag_embedding': {'act...
name = <local> 'output/output', len = 13
get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 875, in TFNetwork.construct_layer
line: layer_class.transform_config_dict(layer_desc, network=net, get_layer=get_layer)
locals:
layer_class = <local> <class 'returnn.tf.layers.basic.LinearLayer'>
layer_class.transform_config_dict = <local> <bound method LayerBase.transform_config_dict of <class 'returnn.tf.layers.basic.LinearLayer'>>
layer_desc = <local> {'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce', '_network': <TFNetwork 'root/output(rec-subnet)/output(subnet)' parent_layer=<InternalLayer output/'output' out_type=Data(batch_shape_meta=[B?])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'outp..., len = 8
network = <not found>
net = <local> <TFNetwork 'root/output(rec-subnet)/output(subnet)' parent_layer=<InternalLayer output/'output' out_type=Data(batch_shape_meta=[B?])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
get_layer = <local> <function Subnetwork.get_layer_func.<locals>.wrapped_get_layer at 0x7f0d7a2e5ee0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/base.py", line 501, in LayerBase.transform_config_dict
line: target_layers[target] = get_layer("data:%s" % target)
locals:
target_layers = <local> {}
target = <local> 'classes', len = 7
get_layer = <local> <function Subnetwork.get_layer_func.<locals>.wrapped_get_layer at 0x7f0d7a2e5ee0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2608, in Subnetwork.get_layer_func.<locals>.wrapped_get_layer
line: return self._get_data(name=name[len("data:"):], get_layer=get_layer)
locals:
self = <local> Subnetwork{root/output(rec-subnet)/output(subnet)}
self._get_data = <local> <bound method Subnetwork._get_data of Subnetwork{root/output(rec-subnet)/output(subnet)}>
name = <local> 'data:classes', len = 12
len = <builtin> <built-in function len>
get_layer = <local> <function Subnetwork.get_sub_layer_func.<locals>.wrapped_get_layer at 0x7f0d7a2e5e50>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2539, in Subnetwork._get_data
line: return base_get_layer("data:%s" % name)
locals:
base_get_layer = <local> <function Subnetwork._get_data.<locals>.base_get_layer at 0x7f0d7a2e5f70>
name = <local> 'classes', len = 7
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2524, in Subnetwork._get_data.<locals>.base_get_layer
line: return get_layer("base:" + name_)
locals:
get_layer = <local> <function Subnetwork.get_sub_layer_func.<locals>.wrapped_get_layer at 0x7f0d7a2e5e50>
name_ = <local> 'data:classes', len = 12
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 2576, in Subnetwork.get_sub_layer_func.<locals>.wrapped_get_layer
line: return base_get_layer(name[len("base:"):])
locals:
base_get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
name = <local> 'base:data:classes', len = 17
len = <builtin> <built-in function len>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/rec.py", line 1522, in _SubnetworkRecCell._construct.<locals>.get_layer
line: layer = self.net.construct_layer(self.net_dict, name=name, get_layer=get_layer)
locals:
layer = <not found>
self = <local> <_SubnetworkRecCell 'root/output(rec-subnet)'>
self.net = <local> <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self.net.construct_layer = <local> <bound method TFNetwork.construct_layer of <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
self.net_dict = <local> {'output': {'class': 'subnetwork', 'from': ['prev:tag_embedding'], 'subnetwork': {'hidden': {'class': 'rnn_cell', 'n_out': 64, 'unit': 'LSTMBlock'}, 'output': {'class': 'linear', 'from': ['hidden'], 'n_out': 20, 'target': 'classes', 'activation': 'softmax', 'loss': 'ce'}}}, 'tag_embedding': {'act...
name = <local> 'data:classes', len = 12
get_layer = <local> <function _SubnetworkRecCell._construct.<locals>.get_layer at 0x7f0d7a2e5ca0>
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 882, in TFNetwork.construct_layer
line: return add_layer(name=name_with_prefix, layer_class=layer_class, **layer_desc)
locals:
add_layer = <local> <bound method TFNetwork.add_layer of <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
name = <local> 'data:classes', len = 12
name_with_prefix = <local> 'data:classes', len = 12
layer_class = <local> <class 'returnn.tf.layers.basic.SourceLayer'>
layer_desc = <local> {'data_key': 'classes', '_network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'data:classes', 'sources': []}
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 1037, in TFNetwork.add_layer
line: layer = self._create_layer(name=name, layer_class=layer_class, **layer_desc)
locals:
layer = <not found>
self = <local> <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
self._create_layer = <local> <bound method TFNetwork._create_layer of <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>>
name = <local> 'data:classes', len = 12
layer_class = <local> <class 'returnn.tf.layers.basic.SourceLayer'>
layer_desc = <local> {'data_key': 'classes', '_network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'data:classes', 'sources': []}
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/network.py", line 959, in TFNetwork._create_layer
line: layer = layer_class(**layer_desc)
locals:
layer = <not found>
layer_class = <local> <class 'returnn.tf.layers.basic.SourceLayer'>
layer_desc = <local> {'data_key': 'classes', '_network': <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>, '_name': 'data:classes', 'sources': [], 'name': 'data:..., len = 7
File "/u/hoffbauer/code/returnn-nlu-fork/returnn/tf/layers/basic.py", line 38, in SourceLayer.__init__
line: raise Exception("%r: data %r:%r only exists as template. You can only use %r." % (
self, data_key, data,
{k: v for (k, v) in network.extern_data.data.items() if v.placeholder is not None}))
locals:
Exception = <builtin> <class 'Exception'>
self = <local> <SourceLayer output/'data:classes' out_type=Data(dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])>
data_key = <local> 'classes', len = 7
data = <local> Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])
k = <not found>
v = <not found>
network = <local> <TFNetwork 'root/output(rec-subnet)' parent_layer=<RecLayer 'output' out_type=Data(batch_shape_meta=[T|'time:var:extern_data:classes',B,F|20])> train=<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>>
network.extern_data = <local> <ExternData data={'source': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B]), 'classes': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])}>
network.extern_data.data = <local> {'source': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B]), 'classes': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])}
network.extern_data.data.items = <local> <built-in method items of dict object at 0x7f0df415e780>
v.placeholder = <not found>
Exception: <SourceLayer output/'data:classes' out_type=Data(dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])>: data 'classes':Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B]) only exists as template. You can only use {'source': Data(name='classes', dtype='int32', sparse=True, dim=20, batch_shape_meta=[B])}.
Process finished with exit code 1
Can confirm it works without optimize_move_layers_out = False
Closing
The original bug is not solved. It's just a coincidence that you run not into this which proper optimization because in your case it happens that all layers are moved out and then the problem does not occur.
Ok. I see.
Do you get this error with the latest RETURNN version?