returnn
returnn copied to clipboard
TF get_sprint_automata_for_batch: RASR segmentation fault in `Speech::CTCTopologyGraphBuilder::addLoopTransition`
I created an apptainer image with tf 2.13 and tried to run a training with FastBaumWelchLoss. It crashes in step 0 because the get_sprint_automata_for_batch op is not found.
The actual error is this:
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
Ah, that's just in help_on_tf_exception, which is not critical (help_on_tf_exception is itself for debugging only, to print some additional information, and for some reason, it fails).
But it means there was another actual exception happening before. Can you post the full log?
Sure, the full log is here:
RETURNN starting up, version 1.20231107.125810+git.dbef0ca0, date/time 2023-11-08-12-17-46 (UTC+0100), pid 1212279, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-04
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is not set.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
1/1: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 3855380559335333431
xla_global_id: -1
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
OggZipDataset, sequences: 300, frames: unknown
RETURNN starting up, version 1.20231107.125810+git.dbef0ca0, date/time 2023-11-08-12-18-11 (UTC+0100), pid 3325131, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-285
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '2'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 7046766875533982763
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10089005056
locality {
bus_id: 1
links {
}
}
incarnation: 14158601620701111509
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5"
xla_global_id: 416903419
Using gpu device 2: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-285', GPU 2, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer batch_norm 'conformer_1_conv_mod_bn' #: 512
layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_1_conv_mod_dropout' #: 512
layer gating 'conformer_1_conv_mod_glu' #: 512
layer layer_norm 'conformer_1_conv_mod_ln' #: 512
layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_1_conv_mod_res_add' #: 512
layer activation 'conformer_1_conv_mod_swish' #: 512
layer copy 'conformer_1_ffmod_1_dropout' #: 512
layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
layer copy 'conformer_1_ffmod_2_dropout' #: 512
layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
layer copy 'conformer_1_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_1_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_1_output' #: 512
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 24000
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 750
layer conv 'features/conv_h' #: 150
layer eval 'features/conv_h_act' #: 150
layer variable 'features/conv_h_filter' #: 150
layer split_dims 'features/conv_h_split' #: 1
layer conv 'features/conv_l' #: 5
layer layer_norm 'features/conv_l_act' #: 750
layer eval 'features/conv_l_act_no_norm' #: 750
layer merge_dims 'features/conv_l_merge' #: 750
layer copy 'features/output' #: 750
layer copy 'input_dropout' #: 512
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-11-18-11
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 3325822
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
TensorFlow exception: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
Exception UnknownError() in step 0. (pid 3325131)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1379[0m, [34min[0m BaseSession._do_call
[34mline:[0m [34mreturn[0m fn[34m([0m[34m*[0margs[34m)[0m
[34mlocals:[0m
fn [34;1m=[0m [34m<local>[0m [34m<[0mfunction BaseSession[34m.[0m_do_run[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0m_run_fn at 0x7f2192d77d30[34m>[0m
args [34;1m=[0m [34m<local>[0m [34m([0m[34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2422de3eb0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m00...
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1362[0m, [34min[0m BaseSession._do_run.<locals>._run_fn
[34mline:[0m [34mreturn[0m self[34m.[0m_call_tf_sessionrun[34m([0moptions[34m,[0m feed_dict[34m,[0m fetch_list[34m,[0m
target_list[34m,[0m run_metadata[34m)[0m
[34mlocals:[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m
self[34;1m.[0m_call_tf_sessionrun [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_call_tf_sessionrun of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m[34m>[0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2422de3eb0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetch_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f24250d81b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2423f96cf0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2423b01830[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
target_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f24080fa970[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f24080fa930[34m>[0m[34m][0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1455[0m, [34min[0m BaseSession._call_tf_sessionrun
[34mline:[0m [34mreturn[0m tf_session[34m.[0mTF_SessionRun_wrapper[34m([0mself[34m.[0m_session[34m,[0m options[34m,[0m feed_dict[34m,[0m
fetch_list[34m,[0m target_list[34m,[0m
run_metadata[34m)[0m
[34mlocals:[0m
tf_session [34;1m=[0m [34m<global>[0m [34m<[0mmodule [36m'tensorflow.python.client.pywrap_tf_session'[0m [34mfrom[0m [36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'[0m[34m>[0m
tf_session[34;1m.[0mTF_SessionRun_wrapper [34;1m=[0m [34m<global>[0m [34m<[0mbuilt[34m-[0m[34min[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f2538137300[34m>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m
self[34;1m.[0m_session [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Session object at 0x7f2423372a70[34m>[0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2422de3eb0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetch_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f24250d81b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2423f96cf0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2423b01830[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
target_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f24080fa970[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f24080fa930[34m>[0m[34m][0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[31mUnknownError[0m: 2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/[0m[36;1mengine.py[0m[36m"[0m, [34mline[0m [35m744[0m, [34min[0m Runner.run
[34mline:[0m fetches_results [34m=[0m sess[34m.[0mrun[34m([0m
fetches_dict[34m,[0m feed_dict[34m=[0mfeed_dict[34m,[0m options[34m=[0mrun_options
[34m)[0m [37m# type: typing.Dict[str,typing.Union[numpy.ndarray,str]][0m
[34mlocals:[0m
fetches_results [34;1m=[0m [34m<not found>[0m
sess [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m
sess[34;1m.[0mrun [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0mrun of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m[34m>[0m
fetches_dict [34;1m=[0m [34m<local>[0m [34m{[0m[36m'size:data:0'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'loss'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'cost:output'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'loss_norm_[0m..., len [34m=[0m 8
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
options [34;1m=[0m [34m<not found>[0m
run_options [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m969[0m, [34min[0m BaseSession.run
[34mline:[0m result [34m=[0m self[34m.[0m_run[34m([0m[34mNone[0m[34m,[0m fetches[34m,[0m feed_dict[34m,[0m options_ptr[34m,[0m
run_metadata_ptr[34m)[0m
[34mlocals:[0m
result [34;1m=[0m [34m<not found>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m
self[34;1m.[0m_run [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_run of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m[34m>[0m
fetches [34;1m=[0m [34m<local>[0m [34m{[0m[36m'size:data:0'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'loss'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'cost:output'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'loss_norm_[0m..., len [34m=[0m 8
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
options_ptr [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata_ptr [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1192[0m, [34min[0m BaseSession._run
[34mline:[0m results [34m=[0m self[34m.[0m_do_run[34m([0mhandle[34m,[0m final_targets[34m,[0m final_fetches[34m,[0m
feed_dict_tensor[34m,[0m options[34m,[0m run_metadata[34m)[0m
[34mlocals:[0m
results [34;1m=[0m [34m<not found>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m
self[34;1m.[0m_do_run [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_do_run of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m[34m>[0m
handle [34;1m=[0m [34m<local>[0m [34mNone[0m
final_targets [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mOperation [36m'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1'[0m type[34m=[0mMerge[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mOperation [36m'optim_and_step_incr'[0m type[34m=[0mNoOp[34m>[0m[34m][0m
final_fetches [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss_init/truediv:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'globals/mem_usage_deviceGPU0:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0m[34min[0m...
feed_dict_tensor [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mReference wrapping [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049...
options [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1372[0m, [34min[0m BaseSession._do_run
[34mline:[0m [34mreturn[0m self[34m.[0m_do_call[34m([0m_run_fn[34m,[0m feeds[34m,[0m fetches[34m,[0m targets[34m,[0m options[34m,[0m
run_metadata[34m)[0m
[34mlocals:[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m
self[34;1m.[0m_do_call [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_do_call of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f2571096ac0[34m>[0m[34m>[0m
_run_fn [34;1m=[0m [34m<local>[0m [34m<[0mfunction BaseSession[34m.[0m_do_run[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0m_run_fn at 0x7f2192d77d30[34m>[0m
feeds [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2422de3eb0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetches [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f24250d81b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2423f96cf0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f2423b01830[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
targets [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f24080fa970[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f24080fa930[34m>[0m[34m][0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1398[0m, [34min[0m BaseSession._do_call
[34mline:[0m [34mraise[0m type[34m([0me[34m)[0m[34m([0mnode_def[34m,[0m op[34m,[0m message[34m)[0m [37m# pylint: disable=no-value-for-parameter[0m
[34mlocals:[0m
type [34;1m=[0m [34m<builtin>[0m [34m<[0m[34mclass[0m [36m'type'[0m[34m>[0m
e [34;1m=[0m [34m<not found>[0m
node_def [34;1m=[0m [34m<local>[0m name[34m:[0m [36m"objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"[0m
op[34m:[0m [36m"PyFunc"[0m
input[34m:[0m [36m"extern_data/placeholders/seq_tag/seq_tag"[0m
attr [34m{[0m
key[34m:[0m [36m"token"[0m
value [34m{[0m
s[34m:[0m [36m"pyfunc_0"[0m
[34m}[0m
[34m}[0m
attr [34m{[0m
key[34m:[0m [36m"Tout"[0m
value [34m{[0m
list [34m{[0m
type[34m:[0m DT_INT32
type[34m:[0m DT_FLOAT
type[34m:[0m DT_INT...
op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
message [34;1m=[0m [34m<local>[0m [36m'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <[0m..., len [34m=[0m 14876
[31mUnknownError[0m: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in __init__
self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init
self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
During handling of the above exception, another exception occurred:
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/[0m[36;1mnetwork.py[0m[36m"[0m, [34mline[0m [35m4341[0m, [34min[0m help_on_tf_exception
[34mline:[0m debug_fetch[34m,[0m fetch_helpers[34m,[0m op_copied [34m=[0m FetchHelper[34m.[0mcopy_graph[34m([0m
debug_fetch[34m,[0m
target_op[34m=[0mop[34m,[0m
fetch_helper_tensors[34m=[0mlist[34m([0mop[34m.[0minputs[34m)[0m[34m,[0m
stop_at_ts[34m=[0mstop_at_ts[34m,[0m
verbose_stream[34m=[0mfile[34m,[0m
[34m)[0m
[34mlocals:[0m
debug_fetch [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'extern_data/placeholders/seq_tag/seq_tag'[0m type[34m=[0mPlaceholder[34m>[0m
fetch_helpers [34;1m=[0m [34m<not found>[0m
op_copied [34;1m=[0m [34m<not found>[0m
FetchHelper [34;1m=[0m [34m<local>[0m [34m<[0m[34mclass[0m [36m'returnn.tf.util.basic.FetchHelper'[0m[34m>[0m
FetchHelper[34;1m.[0mcopy_graph [34;1m=[0m [34m<local>[0m [34m<[0mbound method FetchHelper[34m.[0mcopy_graph of [34m<[0m[34mclass[0m [36m'returnn.tf.util.basic.FetchHelper'[0m[34m>[0m[34m>[0m
target_op [34;1m=[0m [34m<not found>[0m
op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
fetch_helper_tensors [34;1m=[0m [34m<not found>[0m
list [34;1m=[0m [34m<builtin>[0m [34m<[0m[34mclass[0m [36m'list'[0m[34m>[0m
op[34;1m.[0minputs [34;1m=[0m [34m<local>[0m [34m([0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/seq_tag/seq_tag:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mstring[34m>[0m[34m,[0m[34m)[0m
stop_at_ts [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/seq_tag/seq_tag:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mstring[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/batch_dim:[0m...
verbose_stream [34;1m=[0m [34m<not found>[0m
file [34;1m=[0m [34m<local>[0m [34m<[0mreturnn[34m.[0mlog[34m.[0mStream object at 0x7f25711ccdf0[34m>[0m
[34;1mFile[0m [36m"/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/util/[0m[36;1mbasic.py[0m[36m"[0m, [34mline[0m [35m7700[0m, [34min[0m FetchHelper.copy_graph
[34mline:[0m [34massert[0m target_op [34min[0m ops[34m,[0m [36m"target_op %r,\nops\n%s"[0m [34m%[0m [34m([0mtarget_op[34m,[0m pformat[34m([0mops[34m)[0m[34m)[0m
[34mlocals:[0m
target_op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
ops [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mOperation [36m'extern_data/placeholders/seq_tag/seq_tag'[0m type[34m=[0mPlaceholder[34m>[0m[34m][0m
pformat [34;1m=[0m [34m<local>[0m [34m<[0mfunction pformat at 0x7f2575517c10[34m>[0m
[31mAssertionError[0m: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
Step meta information:
{'seq_idx': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38],
'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760 6246 6372 6861 7296 7499 7534 7622 7824 8031 8295 8431
8690 8675 8667 8886 9084 9199 9163 9156 9274 9262 9540 9668
9678 9719 9711 9902 9989 10010 10020 10073 10006 10102 10131 10112
10130 10178 10208])
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 3325131)
See also in /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn.log to avoid the broken color codes here.
I created script to reproduce the error: vieting@cn-285:/work/asr4/vieting/tmp/20231108_tf213_sprint_op $ ./run_example.sh
We encountered this bug and there is a patch for it. Daniel wanted to do a PR.
On Wed, Nov 8, 2023, 12:25 vieting @.***> wrote:
Sure, the full log is here:
RETURNN starting up, version 1.20231107.125810+git.dbef0ca0, date/time 2023-11-08-12-17-46 (UTC+0100), pid 1212279, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3 RETURNN command line options: ['returnn.config'] Hostname: cn-04 TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (
in /usr/local/lib/python3.8/dist-packages/tensorflow) Use num_threads=1 (but min 2) via OMP_NUM_THREADS. Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}. CUDA_VISIBLE_DEVICES is not set. Collecting TensorFlow device list... Local devices available to TensorFlow: 1/1: name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 3855380559335333431 xla_global_id: -1 Train data: input: 1 x 1 output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]} OggZipDataset, sequences: 249229, frames: unknown Dev data: OggZipDataset, sequences: 300, frames: unknown RETURNN starting up, version 1.20231107.125810+git.dbef0ca0, date/time 2023-11-08-12-18-11 (UTC+0100), pid 3325131, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3 RETURNN command line options: ['returnn.config'] Hostname: cn-285 TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) ( in /usr/local/lib/python3.8/dist-packages/tensorflow) Use num_threads=1 (but min 2) via OMP_NUM_THREADS. Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}. CUDA_VISIBLE_DEVICES is set to '2'. Collecting TensorFlow device list... Local devices available to TensorFlow: 1/2: name: "/device:CPU:0" device_type: "CPU" memory_limit: 268435456 locality { } incarnation: 7046766875533982763 xla_global_id: -1 2/2: name: "/device:GPU:0" device_type: "GPU" memory_limit: 10089005056 locality { bus_id: 1 links { } } incarnation: 14158601620701111509 physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:41:00.0, compute capability: 7.5" xla_global_id: 416903419 Using gpu device 2: NVIDIA GeForce RTX 2080 Ti Hostname 'cn-285', GPU 2, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB Train data: input: 1 x 1 output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]} OggZipDataset, sequences: 249229, frames: unknown Dev data: OggZipDataset, sequences: 300, frames: unknown Learning-rate-control: file learning_rates.swb.ctc does not exist yet Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ... layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32 layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32 layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32 layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32 layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32 DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input. This will be disallowed with behavior_version 8. layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32 layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channelconv_l:channel'(750)] float32 DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed This will be disallowed with behavior_version 6. layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channelconv_l:channel'(750)] float32 layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channelconv_l:channel'(750)] float32 layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channelconv_l:channel'(750)] float32 layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channelconv_l:channel'(750)] float32 layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channelconv_l:channel'(750)] float32 layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channelconv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32 layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channelconv_l:channel'(750),F|F'conv_1:channel'(32)] float32 layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channelconv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32 layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channelconv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32 layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channelconv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32 layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channelconv_l:channel//2)conv_3:channel'(24000)] float32 layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32 layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32 layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32 layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32 layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32 layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32 layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 DEPRECATION WARNING: batch_norm masked_time should be specified explicitly This will be disallowed with behavior_version 12. layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32 layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32 layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32 layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32 layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32 layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32 Network layer topology: extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'} used data keys: ['data', 'seq_tag'] layers: layer batch_norm 'conformer_1_conv_mod_bn' #: 512 layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512 layer copy 'conformer_1_conv_mod_dropout' #: 512 layer gating 'conformer_1_conv_mod_glu' #: 512 layer layer_norm 'conformer_1_conv_mod_ln' #: 512 layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024 layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512 layer combine 'conformer_1_conv_mod_res_add' #: 512 layer activation 'conformer_1_conv_mod_swish' #: 512 layer copy 'conformer_1_ffmod_1_dropout' #: 512 layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512 layer eval 'conformer_1_ffmod_1_half_res_add' #: 512 layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048 layer layer_norm 'conformer_1_ffmod_1_ln' #: 512 layer copy 'conformer_1_ffmod_2_dropout' #: 512 layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512 layer eval 'conformer_1_ffmod_2_half_res_add' #: 512 layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048 layer layer_norm 'conformer_1_ffmod_2_ln' #: 512 layer linear 'conformer_1_mhsa_mod_att_linear' #: 512 layer copy 'conformer_1_mhsa_mod_dropout' #: 512 layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512 layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64 layer combine 'conformer_1_mhsa_mod_res_add' #: 512 layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512 layer layer_norm 'conformer_1_output' #: 512 layer conv 'conv_1' #: 32 layer pool 'conv_1_pool' #: 32 layer conv 'conv_2' #: 64 layer conv 'conv_3' #: 64 layer merge_dims 'conv_merged' #: 24000 layer split_dims 'conv_source' #: 1 layer source 'data' #: 1 layer copy 'encoder' #: 512 layer subnetwork 'features' #: 750 layer conv 'features/conv_h' #: 150 layer eval 'features/conv_h_act' #: 150 layer variable 'features/conv_h_filter' #: 150 layer split_dims 'features/conv_h_split' #: 1 layer conv 'features/conv_l' #: 5 layer layer_norm 'features/conv_l_act' #: 750 layer eval 'features/conv_l_act_no_norm' #: 750 layer merge_dims 'features/conv_l_merge' #: 750 layer copy 'features/output' #: 750 layer copy 'input_dropout' #: 512 layer linear 'input_linear' #: 512 layer softmax 'output' #: 88 layer eval 'specaug' #: 750 net params #: 18473980 net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>] start training at epoch 1 using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128 learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None pretrain: None start epoch 1 with learning rate 1.325e-05 ... TF: log_dir: output/models/train-2023-11-08-11-18-11 Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}. Initialize optimizer (default) with slots ['m', 'v']. These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>]. SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--.python-control-enabled=true', '--.pymod-path=/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository', '--.pymod-name=returnn.sprint.control', '--.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--.configuration.channel=output-channel', '--.real-time-factor.channel=output-channel', '--.system-info.channel=output-channel', '--.time.channel=output-channel', '--.version.channel=output-channel', '--.log.channel=output-channel', '--.warning.channel=output-channel,', 'stderr', '--.error.channel=output-channel,', 'stderr', '--.statistics.channel=output-channel', '--.progress.channel=output-channel', '--.dot.channel=nil', '--.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--.model-combination.acoustic-model.state-tying.type=lookup', '--.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--.model-combination.acoustic-model.allophones.add-all=yes', '--.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--.model-combination.acoustic-model.hmm.states-per-phone=1', '--.model-combination.acoustic-model.hmm.state-repetitions=1', '--.model-combination.acoustic-model.hmm.across-word-model=yes', '--.model-combination.acoustic-model.hmm.early-recombination=no', '--.model-combination.acoustic-model.tdp.scale=1.0', '--.model-combination.acoustic-model.tdp..loop=0.0', '--.model-combination.acoustic-model.tdp..forward=0.0', '--.model-combination.acoustic-model.tdp..skip=infinity', '--.model-combination.acoustic-model.tdp..exit=0.0', '--.model-combination.acoustic-model.tdp.silence.loop=0.0', '--.model-combination.acoustic-model.tdp.silence.forward=0.0', '--.model-combination.acoustic-model.tdp.silence.skip=infinity', '--.model-combination.acoustic-model.tdp.silence.exit=0.0', '--.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--.model-combination.acoustic-model.phonology.history-length=0', '--.model-combination.acoustic-model.phonology.future-length=0', '--.transducer-builder-filter-out-invalid-allophones=yes', '--.fix-allophone-context-at-word-boundaries=yes', '--.allophone-state-graph-builder.topology=ctc', '--.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--.encoding=UTF-8', '--.output-channel.file=$(LOGFILE)', '--.output-channel.compressed=no', '--.output-channel.append=no', '--.output-channel.unbuffered=no', '--.LOGFILE=nn-trainer.loss.log', '--.TASK=1'] SprintSubprocessInstance: starting, pid 3325822 SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--.python-control-enabled=true', '--.pymod-path=/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository', '--.pymod-name=returnn.sprint.control', '--.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--.configuration.channel=output-channel', '--.real-time-factor.channel=output-channel', '--.system-info.channel=output-channel', '--.time.channel=output-channel', '--.version.channel=output-channel', '--.log.channel=output-channel', '--.warning.channel=output-channel,', 'stderr', '--.error.channel=output-channel,', 'stderr', '--.statistics.channel=output-channel', '--.progress.channel=output-channel', '--.dot.channel=nil', '--.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--.model-combination.acoustic-model.state-tying.type=lookup', '--.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--.model-combination.acoustic-model.allophones.add-all=yes', '--.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--.model-combination.acoustic-model.hmm.states-per-phone=1', '--.model-combination.acoustic-model.hmm.state-repetitions=1', '--.model-combination.acoustic-model.hmm.across-word-model=yes', '--.model-combination.acoustic-model.hmm.early-recombination=no', '--.model-combination.acoustic-model.tdp.scale=1.0', '--.model-combination.acoustic-model.tdp..loop=0.0', '--.model-combination.acoustic-model.tdp..forward=0.0', '--.model-combination.acoustic-model.tdp..skip=infinity', '--.model-combination.acoustic-model.tdp..exit=0.0', '--.model-combination.acoustic-model.tdp.silence.loop=0.0', '--.model-combination.acoustic-model.tdp.silence.forward=0.0', '--.model-combination.acoustic-model.tdp.silence.skip=infinity', '--.model-combination.acoustic-model.tdp.silence.exit=0.0', '--.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--.model-combination.acoustic-model.phonology.history-length=0', '--.model-combination.acoustic-model.phonology.future-length=0', '--.transducer-builder-filter-out-invalid-allophones=yes', '--.fix-allophone-context-at-word-boundaries=yes', '--.allophone-state-graph-builder.topology=ctc', '--.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--.encoding=UTF-8', '--.output-channel.file=$(LOGFILE)', '--.output-channel.compressed=no', '--.output-channel.append=no', '--.output-channel.unbuffered=no', '--.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception. TensorFlow exception: Graph execution error: Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last): File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in
main() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/main.py", line 634, in main execute_main_task() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/main.py", line 439, in execute_main_task engine.init_train_from_config(config, train_data, dev_data, eval_data) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config self.init_network_from_config(config) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config self._init_network(net_desc=net_dict, epoch=self.epoch) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network self.network, self.updater = self.create_network( File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in init self.loss = network.get_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective self.maybe_construct_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective self._construct_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized if loss_obj.get_loss_value_for_objective() is not None: File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective self._prepare() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare self._loss_value = self.loss.get_value() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata( File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op edges, weights, start_end_states = tf_compat.v1.py_func( Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last): File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in main() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/main.py", line 634, in main execute_main_task() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/main.py", line 439, in execute_main_task engine.init_train_from_config(config, train_data, dev_data, eval_data) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config self.init_network_from_config(config) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config self._init_network(net_desc=net_dict, epoch=self.epoch) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network self.network, self.updater = self.create_network( File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in init self.loss = network.get_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective self.maybe_construct_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective self._construct_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized if loss_obj.get_loss_value_for_objective() is not None: File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective self._prepare() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare self._loss_value = self.loss.get_value() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata( File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op edges, weights, start_end_states = tf_compat.v1.py_func( Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' 2 root error(s) found. (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed Traceback (most recent call last): File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in call ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in init self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]] [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]] (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in call ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in init self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]] 0 successful operations. 0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch': File "/u/vieting/setups/swb/20230406_feat/work/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/rnn.py", line 11, in
main() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/main.py", line 634, in main execute_main_task() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/main.py", line 439, in execute_main_task engine.init_train_from_config(config, train_data, dev_data, eval_data) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1149, in init_train_from_config self.init_network_from_config(config) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1234, in init_network_from_config self._init_network(net_desc=net_dict, epoch=self.epoch) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1429, in _init_network self.network, self.updater = self.create_network( File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/engine.py", line 1491, in create_network updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/updater.py", line 172, in init self.loss = network.get_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1552, in get_objective self.maybe_construct_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1545, in maybe_construct_objective self._construct_objective() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1529, in _construct_objective losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 1499, in get_losses_initialized if loss_obj.get_loss_value_for_objective() is not None: File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 3957, in get_loss_value_for_objective self._prepare() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/network.py", line 4080, in _prepare self._loss_value = self.loss.get_value() File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/layers/basic.py", line 13165, in get_value fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata( File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags) File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op edges, weights, start_end_states = tf_compat.v1.py_func( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler return fn(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler return dispatch_target(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func return py_func_common(func, inp, Tout, stateful, name=name) File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common return _internal_py_func( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func result = gen_script_ops.py_func( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func _, _, _op, _outputs = _op_def_library._apply_op_helper( File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper op = g._create_op_internal(op_type_name, inputs, dtypes=None, File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal ret = Operation.from_node_def( Exception UnknownError() in step 0. (pid 3325131) Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc> We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>, ops [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>] �[31;1mEXCEPTION�[0m �[34mTraceback (most recent call last):�[0m �[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1379�[0m, �[34min�[0m BaseSession._do_call �[34mline:�[0m �[34mreturn�[0m fn�[34m(�[0m�[34m*�[0margs�[34m)�[0m �[34mlocals:�[0m fn �[34;1m=�[0m �[34m
�[0m �[34m<�[0mfunction BaseSession�[34m.�[0m_do_run�[34m.�[0m�[34m<�[0mlocals�[34m>�[0m�[34m.�[0m_run_fn at 0x7f2192d77d30�[34m>�[0m args �[34;1m=�[0m �[34m �[0m �[34m(�[0m�[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m00...�[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1362�[0m, �[34min�[0m BaseSession._do_run.
._run_fn �[34mline:�[0m �[34mreturn�[0m self�[34m.�[0m_call_tf_sessionrun�[34m(�[0moptions�[34m,�[0m feed_dict�[34m,�[0m fetch_list�[34m,�[0m target_list�[34m,�[0m run_metadata�[34m)�[0m �[34mlocals:�[0m self �[34;1m=�[0m �[34m �[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m self�[34;1m.�[0m_call_tf_sessionrun �[34;1m=�[0m �[34m �[0m �[34m<�[0mbound method BaseSession�[34m.�[0m_call_tf_sessionrun of �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m�[34m>�[0m options �[34;1m=�[0m �[34m �[0m �[34mNone�[0m feed_dict �[34;1m=�[0m �[34m �[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m001... fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f24250d81b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423f96cf0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423b01830�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou... target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa970�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa930�[34m>�[0m�[34m]�[0m run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m�[34;1mFile�[0m �[36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/�[0m�[36;1msession.py�[0m�[36m"�[0m, �[34mline�[0m �[35m1455�[0m, �[34min�[0m BaseSession._call_tf_sessionrun �[34mline:�[0m �[34mreturn�[0m tf_session�[34m.�[0mTF_SessionRun_wrapper�[34m(�[0mself�[34m.�[0m_session�[34m,�[0m options�[34m,�[0m feed_dict�[34m,�[0m fetch_list�[34m,�[0m target_list�[34m,�[0m run_metadata�[34m)�[0m �[34mlocals:�[0m tf_session �[34;1m=�[0m �[34m
�[0m �[34m<�[0mmodule �[36m'tensorflow.python.client.pywrap_tf_session'�[0m �[34mfrom�[0m �[36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'�[0m�[34m>�[0m tf_session�[34;1m.�[0mTF_SessionRun_wrapper �[34;1m=�[0m �[34m �[0m �[34m<�[0mbuilt�[34m-�[0m�[34min�[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f2538137300�[34m>�[0m self �[34;1m=�[0m �[34m �[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0msession�[34m.�[0mSession object at 0x7f2571096ac0�[34m>�[0m self�[34;1m.�[0m_session �[34;1m=�[0m �[34m �[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Session object at 0x7f2423372a70�[34m>�[0m options �[34;1m=�[0m �[34m �[0m �[34mNone�[0m feed_dict �[34;1m=�[0m �[34m �[0m �[34m{�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2422de3eb0�[34m>�[0m�[34m:�[0m array�[34m(�[0m�[34m[�[0m�[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m05505638�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m09610788�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m05115783�[34m]�[0m�[34m,�[0m �[34m.�[0m�[34m.�[0m�[34m.�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m,�[0m �[34m[�[0m 0�[34m.�[0m �[34m]�[0m�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m[�[0m�[34m-�[0m0�[34m.�[0m00226238�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m01049833�[34m]�[0m�[34m,�[0m �[34m[�[0m�[34m-�[0m0�[34m.�[0m001... fetch_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f24250d81b0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423f96cf0�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Output object at 0x7f2423b01830�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Ou... target_list �[34;1m=�[0m �[34m<local>�[0m �[34m[�[0m�[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa970�[34m>�[0m�[34m,�[0m �[34m<�[0mtensorflow�[34m.�[0mpython�[34m.�[0mclient�[34m.�[0m_pywrap_tf_session�[34m.�[0mTF_Operation object at 0x7f24080fa930�[34m>�[0m�[34m]�[0m run_metadata �[34;1m=�[0m �[34m<local>�[0m �[34mNone�[0m�[31mUnknownError�[0m: 2 root error(s) found. (0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read return Unpickler(p).load()
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in call ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper return func(*args, **kwargs)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 511, in get_automata_for_batch instance = self._get_instance(i)
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 417, in _get_instance self._maybe_create_new_instance()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 405, in _maybe_create_new_instance self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 80, in init self.init()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 302, in init self._start_child()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 169, in _start_child raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]] [[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]] (1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core
1 diff --git a/returnn/sprint/error_signals.py b/returnn/sprint/error_signals.py
2 index 735ac363..1c204e68 100644
3 --- a/returnn/sprint/error_signals.py
4 +++ b/returnn/sprint/error_signals.py
5 @@ -130,7 +130,7 @@ class SprintSubprocessInstance:
6
7 def _start_child(self):
8 assert self.child_pid is None
9 - self.pipe_c2p = self._pipe_open()
10 + self.pipe_c2p = self._pipe_open(buffered=True)
11 self.pipe_p2c = self._pipe_open()
12 args = self._build_sprint_args()
13 print("SprintSubprocessInstance: exec", args, file=log.v5)
14 @@ -169,14 +169,14 @@ class SprintSubprocessInstance:
15 raise Exception("SprintSubprocessInstance Sprint init failed")
16
17 # noinspection PyMethodMayBeStatic
18 - def _pipe_open(self):
19 + def _pipe_open(self, buffered=False):
20 readend, writeend = os.pipe()
21 if hasattr(os, "set_inheritable"):
22 # https://www.python.org/dev/peps/pep-0446/
23 os.set_inheritable(readend, True)
24 os.set_inheritable(writeend, True)
25 - readend = os.fdopen(readend, "rb", 0)
26 - writeend = os.fdopen(writeend, "wb", 0)
27 + readend = os.fdopen(readend, "rb", -bool(buffered)) # -1 is default for buffered
28 + writeend = os.fdopen(writeend, "wb", -bool(buffered))
29 return readend, writeend
30
31 @property
~ ~ ~ ~ ~ 1 diff --git a/returnn/sprint/error_signals.py b/returnn/sprint/error_signals.py 2 index 735ac363..1c204e68 100644 3 --- a/returnn/sprint/error_signals.py 4 +++ b/returnn/sprint/error_signals.py 5 @@ -130,7 +130,7 @@ class SprintSubprocessInstance: 6 7 def _start_child(self): 8 assert self.child_pid is None 9 - self.pipe_c2p = self._pipe_open() 10 + self.pipe_c2p = self._pipe_open(buffered=True) 11 self.pipe_p2c = self._pipe_open() 12 args = self._build_sprint_args() 13 print("SprintSubprocessInstance: exec", args, file=log.v5) 14 @@ -169,14 +169,14 @@ class SprintSubprocessInstance: 15 raise Exception("SprintSubprocessInstance Sprint init failed") 16 17 # noinspection PyMethodMayBeStatic 18 - def _pipe_open(self): 19 + def _pipe_open(self, buffered=False): 20 readend, writeend = os.pipe() 21 if hasattr(os, "set_inheritable"): 22 # https://www.python.org/dev/peps/pep-0446/ 23 os.set_inheritable(readend, True) 24 os.set_inheritable(writeend, True) 25 - readend = os.fdopen(readend, "rb", 0) 26 - writeend = os.fdopen(writeend, "wb", 0) 27 + readend = os.fdopen(readend, "rb", -bool(buffered)) # -1 is default for buffered 28 + writeend = os.fdopen(writeend, "wb", -bool(buffered)) 29 return readend, writeend 30 31 @property
AFAIR, the problem occurs when running in apptainer environment only. The buffer does not contain all info and returnn crashes because of rasr automata being truncated/ not complete
So for reference, the actual error is this:
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 164, in _start_child
ret = self._read()
File "/work/asr4/vieting/setups/swb/work/20230406_feat/i6_core/tools/git/CloneGitRepositoryJob.Sc1EzS78fRSC/output/repository/returnn/sprint/error_signals.py", line 225, in _read
return Unpickler(p).load()
EOFError: Ran out of input
I just tested the proposed patch and it does not fix the issue for my example.
Can you link the full patch? It seems incomplete here.
Can you link the full patch? It seems incomplete here.
Sure, just edited the comment.
@vieting I pushed sth which should fix this. Can you try?
(For reference, there was also an EOFError in #1363, but I think that was another problem.)
Note: I did not actually test my recent change, as I don't have any setup ready to try this. Please try it out and report if it works.
Just tested and I still get the error.
Log:
RETURNN starting up, version 1.20231108.124950+git.a3d1094d, date/time 2023-11-08-14-13-24 (UTC+0100), pid 352402, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-283
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '4'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 14088248937803725314
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10089005056
locality {
bus_id: 2
numa_node: 1
links {
}
}
incarnation: 17654959729817767865
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:81:00.0, compute capability: 7.5"
xla_global_id: 416903419
Using gpu device 4: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-283', GPU 4, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer batch_norm 'conformer_1_conv_mod_bn' #: 512
layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_1_conv_mod_dropout' #: 512
layer gating 'conformer_1_conv_mod_glu' #: 512
layer layer_norm 'conformer_1_conv_mod_ln' #: 512
layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_1_conv_mod_res_add' #: 512
layer activation 'conformer_1_conv_mod_swish' #: 512
layer copy 'conformer_1_ffmod_1_dropout' #: 512
layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
layer copy 'conformer_1_ffmod_2_dropout' #: 512
layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
layer copy 'conformer_1_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_1_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_1_output' #: 512
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 24000
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 750
layer conv 'features/conv_h' #: 150
layer eval 'features/conv_h_act' #: 150
layer variable 'features/conv_h_filter' #: 150
layer split_dims 'features/conv_h_split' #: 1
layer conv 'features/conv_l' #: 5
layer layer_norm 'features/conv_l_act' #: 750
layer eval 'features/conv_l_act_no_norm' #: 750
layer merge_dims 'features/conv_l_merge' #: 750
layer copy 'features/output' #: 750
layer copy 'input_dropout' #: 512
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-13-13-24
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 353093
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
TensorFlow exception: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
Exception UnknownError() in step 0. (pid 352402)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1379[0m, [34min[0m BaseSession._do_call
[34mline:[0m [34mreturn[0m fn[34m([0m[34m*[0margs[34m)[0m
[34mlocals:[0m
fn [34;1m=[0m [34m<local>[0m [34m<[0mfunction BaseSession[34m.[0m_do_run[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0m_run_fn at 0x7f3307fe4f70[34m>[0m
args [34;1m=[0m [34m<local>[0m [34m([0m[34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35983ad7b0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m00...
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1362[0m, [34min[0m BaseSession._do_run.<locals>._run_fn
[34mline:[0m [34mreturn[0m self[34m.[0m_call_tf_sessionrun[34m([0moptions[34m,[0m feed_dict[34m,[0m fetch_list[34m,[0m
target_list[34m,[0m run_metadata[34m)[0m
[34mlocals:[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m
self[34;1m.[0m_call_tf_sessionrun [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_call_tf_sessionrun of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m[34m>[0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35983ad7b0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetch_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35893975b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35893a4ef0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f3589379470[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
target_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f35917f95b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f35917f9770[34m>[0m[34m][0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1455[0m, [34min[0m BaseSession._call_tf_sessionrun
[34mline:[0m [34mreturn[0m tf_session[34m.[0mTF_SessionRun_wrapper[34m([0mself[34m.[0m_session[34m,[0m options[34m,[0m feed_dict[34m,[0m
fetch_list[34m,[0m target_list[34m,[0m
run_metadata[34m)[0m
[34mlocals:[0m
tf_session [34;1m=[0m [34m<global>[0m [34m<[0mmodule [36m'tensorflow.python.client.pywrap_tf_session'[0m [34mfrom[0m [36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'[0m[34m>[0m
tf_session[34;1m.[0mTF_SessionRun_wrapper [34;1m=[0m [34m<global>[0m [34m<[0mbuilt[34m-[0m[34min[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f36aecb2300[34m>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m
self[34;1m.[0m_session [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Session object at 0x7f35986e9470[34m>[0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35983ad7b0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetch_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35893975b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35893a4ef0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f3589379470[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
target_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f35917f95b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f35917f9770[34m>[0m[34m][0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[31mUnknownError[0m: 2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/[0m[36;1mengine.py[0m[36m"[0m, [34mline[0m [35m744[0m, [34min[0m Runner.run
[34mline:[0m fetches_results [34m=[0m sess[34m.[0mrun[34m([0m
fetches_dict[34m,[0m feed_dict[34m=[0mfeed_dict[34m,[0m options[34m=[0mrun_options
[34m)[0m [37m# type: typing.Dict[str,typing.Union[numpy.ndarray,str]][0m
[34mlocals:[0m
fetches_results [34;1m=[0m [34m<not found>[0m
sess [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m
sess[34;1m.[0mrun [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0mrun of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m[34m>[0m
fetches_dict [34;1m=[0m [34m<local>[0m [34m{[0m[36m'size:data:0'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'loss'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'cost:output'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'loss_norm_[0m..., len [34m=[0m 8
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
options [34;1m=[0m [34m<not found>[0m
run_options [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m969[0m, [34min[0m BaseSession.run
[34mline:[0m result [34m=[0m self[34m.[0m_run[34m([0m[34mNone[0m[34m,[0m fetches[34m,[0m feed_dict[34m,[0m options_ptr[34m,[0m
run_metadata_ptr[34m)[0m
[34mlocals:[0m
result [34;1m=[0m [34m<not found>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m
self[34;1m.[0m_run [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_run of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m[34m>[0m
fetches [34;1m=[0m [34m<local>[0m [34m{[0m[36m'size:data:0'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'loss'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'cost:output'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'loss_norm_[0m..., len [34m=[0m 8
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
options_ptr [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata_ptr [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1192[0m, [34min[0m BaseSession._run
[34mline:[0m results [34m=[0m self[34m.[0m_do_run[34m([0mhandle[34m,[0m final_targets[34m,[0m final_fetches[34m,[0m
feed_dict_tensor[34m,[0m options[34m,[0m run_metadata[34m)[0m
[34mlocals:[0m
results [34;1m=[0m [34m<not found>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m
self[34;1m.[0m_do_run [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_do_run of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m[34m>[0m
handle [34;1m=[0m [34m<local>[0m [34mNone[0m
final_targets [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mOperation [36m'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1'[0m type[34m=[0mMerge[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mOperation [36m'optim_and_step_incr'[0m type[34m=[0mNoOp[34m>[0m[34m][0m
final_fetches [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss_init/truediv:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'globals/mem_usage_deviceGPU0:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0m[34min[0m...
feed_dict_tensor [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mReference wrapping [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049...
options [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1372[0m, [34min[0m BaseSession._do_run
[34mline:[0m [34mreturn[0m self[34m.[0m_do_call[34m([0m_run_fn[34m,[0m feeds[34m,[0m fetches[34m,[0m targets[34m,[0m options[34m,[0m
run_metadata[34m)[0m
[34mlocals:[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m
self[34;1m.[0m_do_call [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_do_call of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f36e7563ac0[34m>[0m[34m>[0m
_run_fn [34;1m=[0m [34m<local>[0m [34m<[0mfunction BaseSession[34m.[0m_do_run[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0m_run_fn at 0x7f3307fe4f70[34m>[0m
feeds [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35983ad7b0[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetches [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35893975b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f35893a4ef0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f3589379470[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
targets [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f35917f95b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f35917f9770[34m>[0m[34m][0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1398[0m, [34min[0m BaseSession._do_call
[34mline:[0m [34mraise[0m type[34m([0me[34m)[0m[34m([0mnode_def[34m,[0m op[34m,[0m message[34m)[0m [37m# pylint: disable=no-value-for-parameter[0m
[34mlocals:[0m
type [34;1m=[0m [34m<builtin>[0m [34m<[0m[34mclass[0m [36m'type'[0m[34m>[0m
e [34;1m=[0m [34m<not found>[0m
node_def [34;1m=[0m [34m<local>[0m name[34m:[0m [36m"objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"[0m
op[34m:[0m [36m"PyFunc"[0m
input[34m:[0m [36m"extern_data/placeholders/seq_tag/seq_tag"[0m
attr [34m{[0m
key[34m:[0m [36m"token"[0m
value [34m{[0m
s[34m:[0m [36m"pyfunc_0"[0m
[34m}[0m
[34m}[0m
attr [34m{[0m
key[34m:[0m [36m"Tout"[0m
value [34m{[0m
list [34m{[0m
type[34m:[0m DT_INT32
type[34m:[0m DT_FLOAT
type[34m:[0m DT_INT...
op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
message [34;1m=[0m [34m<local>[0m [36m'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "./returnn/rnn.py", line 11, in <module>\n main()\n File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai[0m..., len [34m=[0m 11284
[31mUnknownError[0m: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 166, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 235, in _read
raise EOFError
EOFError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 533, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 439, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 427, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 82, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 324, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 171, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
During handling of the above exception, another exception occurred:
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/[0m[36;1mnetwork.py[0m[36m"[0m, [34mline[0m [35m4341[0m, [34min[0m help_on_tf_exception
[34mline:[0m debug_fetch[34m,[0m fetch_helpers[34m,[0m op_copied [34m=[0m FetchHelper[34m.[0mcopy_graph[34m([0m
debug_fetch[34m,[0m
target_op[34m=[0mop[34m,[0m
fetch_helper_tensors[34m=[0mlist[34m([0mop[34m.[0minputs[34m)[0m[34m,[0m
stop_at_ts[34m=[0mstop_at_ts[34m,[0m
verbose_stream[34m=[0mfile[34m,[0m
[34m)[0m
[34mlocals:[0m
debug_fetch [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'extern_data/placeholders/seq_tag/seq_tag'[0m type[34m=[0mPlaceholder[34m>[0m
fetch_helpers [34;1m=[0m [34m<not found>[0m
op_copied [34;1m=[0m [34m<not found>[0m
FetchHelper [34;1m=[0m [34m<local>[0m [34m<[0m[34mclass[0m [36m'returnn.tf.util.basic.FetchHelper'[0m[34m>[0m
FetchHelper[34;1m.[0mcopy_graph [34;1m=[0m [34m<local>[0m [34m<[0mbound method FetchHelper[34m.[0mcopy_graph of [34m<[0m[34mclass[0m [36m'returnn.tf.util.basic.FetchHelper'[0m[34m>[0m[34m>[0m
target_op [34;1m=[0m [34m<not found>[0m
op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
fetch_helper_tensors [34;1m=[0m [34m<not found>[0m
list [34;1m=[0m [34m<builtin>[0m [34m<[0m[34mclass[0m [36m'list'[0m[34m>[0m
op[34;1m.[0minputs [34;1m=[0m [34m<local>[0m [34m([0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/seq_tag/seq_tag:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mstring[34m>[0m[34m,[0m[34m)[0m
stop_at_ts [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/seq_tag/seq_tag:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mstring[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/batch_dim:[0m...
verbose_stream [34;1m=[0m [34m<not found>[0m
file [34;1m=[0m [34m<local>[0m [34m<[0mreturnn[34m.[0mlog[34m.[0mStream object at 0x7f36e7695df0[34m>[0m
[34;1mFile[0m [36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/[0m[36;1mbasic.py[0m[36m"[0m, [34mline[0m [35m7700[0m, [34min[0m FetchHelper.copy_graph
[34mline:[0m [34massert[0m target_op [34min[0m ops[34m,[0m [36m"target_op %r,\nops\n%s"[0m [34m%[0m [34m([0mtarget_op[34m,[0m pformat[34m([0mops[34m)[0m[34m)[0m
[34mlocals:[0m
target_op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
ops [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mOperation [36m'extern_data/placeholders/seq_tag/seq_tag'[0m type[34m=[0mPlaceholder[34m>[0m[34m][0m
pformat [34;1m=[0m [34m<local>[0m [34m<[0mfunction pformat at 0x7f36eb9e5c10[34m>[0m
[31mAssertionError[0m: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
Step meta information:
{'seq_idx': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38],
'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760 6246 6372 6861 7296 7499 7534 7622 7824 8031 8295 8431
8690 8675 8667 8886 9084 9199 9163 9156 9274 9262 9540 9668
9678 9719 9711 9902 9989 10010 10020 10073 10006 10102 10131 10112
10130 10178 10208])
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 352402)
@albertz check /work/asr4/vieting/tmp/20231108_tf213_sprint_op/run_example.sh if you want to test it yourself.
@christophmluscher @NeoLegends does this relate to the rasr compiled with TF 2.13? Do you recognize this error?
Is it maybe a problem that RASR was compiled with my old tf 2.8 image? I still use the same RASR binary with the new image. Loading the automata does not require tf, so I thought, that I can use the same RASR.
@vieting I pushed another small change. Can you try again?
I pushed another small change. Can you try again?
Unfortunately, this still does not fix my example.
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
RETURNN starting up, version 1.20231108.140626+git.9fe93590, date/time 2023-11-08-15-13-28 (UTC+0100), pid 356353, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-283
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '4'.
Collecting TensorFlow device list...
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13595377529408947728
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10089005056
locality {
bus_id: 2
numa_node: 1
links {
}
}
incarnation: 17849739553926303687
physical_device_desc: "device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:81:00.0, compute capability: 7.5"
xla_global_id: 416903419
Using gpu device 4: NVIDIA GeForce RTX 2080 Ti
Hostname 'cn-283', GPU 4, GPU-dev-name 'NVIDIA GeForce RTX 2080 Ti', GPU-memory 9.4GB
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer batch_norm 'conformer_1_conv_mod_bn' #: 512
layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_1_conv_mod_dropout' #: 512
layer gating 'conformer_1_conv_mod_glu' #: 512
layer layer_norm 'conformer_1_conv_mod_ln' #: 512
layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_1_conv_mod_res_add' #: 512
layer activation 'conformer_1_conv_mod_swish' #: 512
layer copy 'conformer_1_ffmod_1_dropout' #: 512
layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
layer copy 'conformer_1_ffmod_2_dropout' #: 512
layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
layer copy 'conformer_1_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_1_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_1_output' #: 512
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 24000
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 750
layer conv 'features/conv_h' #: 150
layer eval 'features/conv_h_act' #: 150
layer variable 'features/conv_h_filter' #: 150
layer split_dims 'features/conv_h_split' #: 1
layer conv 'features/conv_l' #: 5
layer layer_norm 'features/conv_l_act' #: 750
layer eval 'features/conv_l_act_no_norm' #: 750
layer merge_dims 'features/conv_l_merge' #: 750
layer copy 'features/output' #: 750
layer copy 'input_dropout' #: 512
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-14-13-28
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 356974
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:37,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
TensorFlow exception: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
Exception UnknownError() in step 0. (pid 356353)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1379[0m, [34min[0m BaseSession._do_call
[34mline:[0m [34mreturn[0m fn[34m([0m[34m*[0margs[34m)[0m
[34mlocals:[0m
fn [34;1m=[0m [34m<local>[0m [34m<[0mfunction BaseSession[34m.[0m_do_run[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0m_run_fn at 0x7f4267b80c10[34m>[0m
args [34;1m=[0m [34m<local>[0m [34m([0m[34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f80b9630[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m00...
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1362[0m, [34min[0m BaseSession._do_run.<locals>._run_fn
[34mline:[0m [34mreturn[0m self[34m.[0m_call_tf_sessionrun[34m([0moptions[34m,[0m feed_dict[34m,[0m fetch_list[34m,[0m
target_list[34m,[0m run_metadata[34m)[0m
[34mlocals:[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m
self[34;1m.[0m_call_tf_sessionrun [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_call_tf_sessionrun of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m[34m>[0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f80b9630[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetch_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f2b68ef0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f2b688b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44ef901eb0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
target_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f44eaac5d70[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f44eaac5db0[34m>[0m[34m][0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1455[0m, [34min[0m BaseSession._call_tf_sessionrun
[34mline:[0m [34mreturn[0m tf_session[34m.[0mTF_SessionRun_wrapper[34m([0mself[34m.[0m_session[34m,[0m options[34m,[0m feed_dict[34m,[0m
fetch_list[34m,[0m target_list[34m,[0m
run_metadata[34m)[0m
[34mlocals:[0m
tf_session [34;1m=[0m [34m<global>[0m [34m<[0mmodule [36m'tensorflow.python.client.pywrap_tf_session'[0m [34mfrom[0m [36m'/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'[0m[34m>[0m
tf_session[34;1m.[0mTF_SessionRun_wrapper [34;1m=[0m [34m<global>[0m [34m<[0mbuilt[34m-[0m[34min[0m method TF_SessionRun_wrapper of PyCapsule object at 0x7f46444243f0[34m>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m
self[34;1m.[0m_session [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Session object at 0x7f44f83404f0[34m>[0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f80b9630[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetch_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f2b68ef0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f2b688b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44ef901eb0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
target_list [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f44eaac5d70[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f44eaac5db0[34m>[0m[34m][0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[31mUnknownError[0m: 2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/[0m[36;1mengine.py[0m[36m"[0m, [34mline[0m [35m744[0m, [34min[0m Runner.run
[34mline:[0m fetches_results [34m=[0m sess[34m.[0mrun[34m([0m
fetches_dict[34m,[0m feed_dict[34m=[0mfeed_dict[34m,[0m options[34m=[0mrun_options
[34m)[0m [37m# type: typing.Dict[str,typing.Union[numpy.ndarray,str]][0m
[34mlocals:[0m
fetches_results [34;1m=[0m [34m<not found>[0m
sess [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m
sess[34;1m.[0mrun [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0mrun of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m[34m>[0m
fetches_dict [34;1m=[0m [34m<local>[0m [34m{[0m[36m'size:data:0'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'loss'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'cost:output'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'loss_norm_[0m..., len [34m=[0m 8
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
options [34;1m=[0m [34m<not found>[0m
run_options [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m969[0m, [34min[0m BaseSession.run
[34mline:[0m result [34m=[0m self[34m.[0m_run[34m([0m[34mNone[0m[34m,[0m fetches[34m,[0m feed_dict[34m,[0m options_ptr[34m,[0m
run_metadata_ptr[34m)[0m
[34mlocals:[0m
result [34;1m=[0m [34m<not found>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m
self[34;1m.[0m_run [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_run of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m[34m>[0m
fetches [34;1m=[0m [34m<local>[0m [34m{[0m[36m'size:data:0'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [36m'loss'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'cost:output'[0m[34m:[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [36m'loss_norm_[0m..., len [34m=[0m 8
feed_dict [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
options_ptr [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata_ptr [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1192[0m, [34min[0m BaseSession._run
[34mline:[0m results [34m=[0m self[34m.[0m_do_run[34m([0mhandle[34m,[0m final_targets[34m,[0m final_fetches[34m,[0m
feed_dict_tensor[34m,[0m options[34m,[0m run_metadata[34m)[0m
[34mlocals:[0m
results [34;1m=[0m [34m<not found>[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m
self[34;1m.[0m_do_run [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_do_run of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m[34m>[0m
handle [34;1m=[0m [34m<local>[0m [34mNone[0m
final_targets [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mOperation [36m'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1'[0m type[34m=[0mMerge[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mOperation [36m'optim_and_step_incr'[0m type[34m=[0mNoOp[34m>[0m[34m][0m
final_fetches [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mTensor [36m'objective/add:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'objective/loss/loss_init/truediv:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'globals/mem_usage_deviceGPU0:0'[0m shape[34m=[0m[34m([0m[34m)[0m dtype[34m=[0m[34min[0m...
feed_dict_tensor [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mReference wrapping [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049...
options [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1372[0m, [34min[0m BaseSession._do_run
[34mline:[0m [34mreturn[0m self[34m.[0m_do_call[34m([0m_run_fn[34m,[0m feeds[34m,[0m fetches[34m,[0m targets[34m,[0m options[34m,[0m
run_metadata[34m)[0m
[34mlocals:[0m
self [34;1m=[0m [34m<local>[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m
self[34;1m.[0m_do_call [34;1m=[0m [34m<local>[0m [34m<[0mbound method BaseSession[34m.[0m_do_call of [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0msession[34m.[0mSession object at 0x7f46458c3d60[34m>[0m[34m>[0m
_run_fn [34;1m=[0m [34m<local>[0m [34m<[0mfunction BaseSession[34m.[0m_do_run[34m.[0m[34m<[0mlocals[34m>[0m[34m.[0m_run_fn at 0x7f4267b80c10[34m>[0m
feeds [34;1m=[0m [34m<local>[0m [34m{[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f80b9630[34m>[0m[34m:[0m array[34m([0m[34m[[0m[34m[[0m[34m[[0m[34m-[0m0[34m.[0m05505638[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m09610788[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m05115783[34m][0m[34m,[0m
[34m.[0m[34m.[0m[34m.[0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m,[0m
[34m[[0m 0[34m.[0m [34m][0m[34m][0m[34m,[0m
[34m[[0m[34m[[0m[34m-[0m0[34m.[0m00226238[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m01049833[34m][0m[34m,[0m
[34m[[0m[34m-[0m0[34m.[0m001...
fetches [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f2b68ef0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44f2b688b0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Output object at 0x7f44ef901eb0[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Ou...
targets [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f44eaac5d70[34m>[0m[34m,[0m [34m<[0mtensorflow[34m.[0mpython[34m.[0mclient[34m.[0m_pywrap_tf_session[34m.[0mTF_Operation object at 0x7f44eaac5db0[34m>[0m[34m][0m
options [34;1m=[0m [34m<local>[0m [34mNone[0m
run_metadata [34;1m=[0m [34m<local>[0m [34mNone[0m
[34;1mFile[0m [36m"/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/[0m[36;1msession.py[0m[36m"[0m, [34mline[0m [35m1398[0m, [34min[0m BaseSession._do_call
[34mline:[0m [34mraise[0m type[34m([0me[34m)[0m[34m([0mnode_def[34m,[0m op[34m,[0m message[34m)[0m [37m# pylint: disable=no-value-for-parameter[0m
[34mlocals:[0m
type [34;1m=[0m [34m<builtin>[0m [34m<[0m[34mclass[0m [36m'type'[0m[34m>[0m
e [34;1m=[0m [34m<not found>[0m
node_def [34;1m=[0m [34m<local>[0m name[34m:[0m [36m"objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"[0m
op[34m:[0m [36m"PyFunc"[0m
input[34m:[0m [36m"extern_data/placeholders/seq_tag/seq_tag"[0m
attr [34m{[0m
key[34m:[0m [36m"token"[0m
value [34m{[0m
s[34m:[0m [36m"pyfunc_0"[0m
[34m}[0m
[34m}[0m
attr [34m{[0m
key[34m:[0m [36m"Tout"[0m
value [34m{[0m
list [34m{[0m
type[34m:[0m DT_INT32
type[34m:[0m DT_FLOAT
type[34m:[0m DT_INT...
op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
message [34;1m=[0m [34m<local>[0m [36m'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "./returnn/rnn.py", line 11, in <module>\n main()\n File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai[0m..., len [34m=[0m 12234
[31mUnknownError[0m: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
During handling of the above exception, another exception occurred:
[31;1mEXCEPTION[0m
[34mTraceback (most recent call last):[0m
[34;1mFile[0m [36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/[0m[36;1mnetwork.py[0m[36m"[0m, [34mline[0m [35m4341[0m, [34min[0m help_on_tf_exception
[34mline:[0m debug_fetch[34m,[0m fetch_helpers[34m,[0m op_copied [34m=[0m FetchHelper[34m.[0mcopy_graph[34m([0m
debug_fetch[34m,[0m
target_op[34m=[0mop[34m,[0m
fetch_helper_tensors[34m=[0mlist[34m([0mop[34m.[0minputs[34m)[0m[34m,[0m
stop_at_ts[34m=[0mstop_at_ts[34m,[0m
verbose_stream[34m=[0mfile[34m,[0m
[34m)[0m
[34mlocals:[0m
debug_fetch [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'extern_data/placeholders/seq_tag/seq_tag'[0m type[34m=[0mPlaceholder[34m>[0m
fetch_helpers [34;1m=[0m [34m<not found>[0m
op_copied [34;1m=[0m [34m<not found>[0m
FetchHelper [34;1m=[0m [34m<local>[0m [34m<[0m[34mclass[0m [36m'returnn.tf.util.basic.FetchHelper'[0m[34m>[0m
FetchHelper[34;1m.[0mcopy_graph [34;1m=[0m [34m<local>[0m [34m<[0mbound method FetchHelper[34m.[0mcopy_graph of [34m<[0m[34mclass[0m [36m'returnn.tf.util.basic.FetchHelper'[0m[34m>[0m[34m>[0m
target_op [34;1m=[0m [34m<not found>[0m
op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
fetch_helper_tensors [34;1m=[0m [34m<not found>[0m
list [34;1m=[0m [34m<builtin>[0m [34m<[0m[34mclass[0m [36m'list'[0m[34m>[0m
op[34;1m.[0minputs [34;1m=[0m [34m<local>[0m [34m([0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/seq_tag/seq_tag:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mstring[34m>[0m[34m,[0m[34m)[0m
stop_at_ts [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data:0'[0m shape[34m=[0m[34m([0m?[34m,[0m ?[34m,[0m 1[34m)[0m dtype[34m=[0mfloat32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/seq_tag/seq_tag:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mstring[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/data/data_dim0_size:0'[0m shape[34m=[0m[34m([0m?[34m,[0m[34m)[0m dtype[34m=[0mint32[34m>[0m[34m,[0m [34m<[0mtf[34m.[0mTensor [36m'extern_data/placeholders/batch_dim:[0m...
verbose_stream [34;1m=[0m [34m<not found>[0m
file [34;1m=[0m [34m<local>[0m [34m<[0mreturnn[34m.[0mlog[34m.[0mStream object at 0x7f4646730e50[34m>[0m
[34;1mFile[0m [36m"/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/[0m[36;1mbasic.py[0m[36m"[0m, [34mline[0m [35m7700[0m, [34min[0m FetchHelper.copy_graph
[34mline:[0m [34massert[0m target_op [34min[0m ops[34m,[0m [36m"target_op %r,\nops\n%s"[0m [34m%[0m [34m([0mtarget_op[34m,[0m pformat[34m([0mops[34m)[0m[34m)[0m
[34mlocals:[0m
target_op [34;1m=[0m [34m<local>[0m [34m<[0mtf[34m.[0mOperation [36m'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'[0m type[34m=[0mPyFunc[34m>[0m
ops [34;1m=[0m [34m<local>[0m [34m[[0m[34m<[0mtf[34m.[0mOperation [36m'extern_data/placeholders/seq_tag/seq_tag'[0m type[34m=[0mPlaceholder[34m>[0m[34m][0m
pformat [34;1m=[0m [34m<local>[0m [34m<[0mfunction pformat at 0x7f464aa7ec10[34m>[0m
[31mAssertionError[0m: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
Step meta information:
{'seq_idx': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38],
'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760 6246 6372 6861 7296 7499 7534 7622 7824 8031 8295 8431
8690 8675 8667 8886 9084 9199 9163 9156 9274 9262 9540 9668
9678 9719 9711 9902 9989 10010 10020 10073 10006 10102 10131 10112
10130 10178 10208])
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 356353)
I get the same error when using a tf 2.14 image and RASR compiled using that image.
Is that the original stdout + stderr, or just the log?
It looks a bit like maybe RASR does not correctly starts at all? You should e.g. see this then on stdout:
print("RETURNN SprintControl[pid %i] Python module load" % os.getpid())
And then:
print(
(
"RETURNN SprintControl[pid %i] init: "
"name=%r, sprint_unit=%r, version_number=%r, callback=%r, ref=%r, config=%r, kwargs=%r"
)
% (os.getpid(), name, sprint_unit, version_number, callback, reference, config, kwargs)
)
If you don't see that, then my recent fixes, and also Tinas patch are not really related to your issue at all.
You should check the RASR log then. There should be some error by RASR, probably Python related, maybe sth like that it could not load the module or so. Maybe some import missing.
What I posted before was from the log. The following is copied from stdout and stderr (with tf 2.14 image, also for RASR compilation):
vieting@cn-251:/work/asr4/vieting/tmp/20231108_tf213_sprint_op$ ./run_example_rasr_tf214.sh
RETURNN starting up, version 1.20231108.140626+git.9fe93590.dirty, date/time 2023-11-08-16-43-54 (UTC+0100), pid 2130233, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.tf214.config']
Hostname: cn-251
2023-11-08 16:44:01.024863: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 16:44:01.024944: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 16:44:01.034051: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-08 16:44:02.271356: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
TensorFlow: 2.14.0 (v2.14.0-rc1-21-g4dacf3f368e) (<not-under-git> in /usr/local/lib/python3.11/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '2'.
Collecting TensorFlow device list...
2023-11-08 16:44:23.424846: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /device:GPU:0 with 10396 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 11581945563073303627
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10901061632
locality {
bus_id: 2
numa_node: 1
links {
}
}
incarnation: 1815047742352363074
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1"
xla_global_id: 416903419
Using gpu device 2: NVIDIA GeForce GTX 1080 Ti
Hostname 'cn-251', GPU 2, GPU-dev-name 'NVIDIA GeForce GTX 1080 Ti', GPU-memory 10.2GB
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-11-08 16:44:31.951062: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:1723: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
2023-11-08 16:44:32.241797: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 24000
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 750
layer conv 'features/conv_h' #: 150
layer eval 'features/conv_h_act' #: 150
layer variable 'features/conv_h_filter' #: 150
layer split_dims 'features/conv_h_split' #: 1
layer conv 'features/conv_l' #: 5
layer layer_norm 'features/conv_l_act' #: 750
layer eval 'features/conv_l_act_no_norm' #: 750
layer merge_dims 'features/conv_l_merge' #: 750
layer copy 'features/output' #: 750
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer copy 'specaug' #: 750
net params #: 12409788
net trainable params: [<tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
2023-11-08 16:44:34.658621: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:382] MLIR V1 optimization pass is not enabled
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-15-43-53
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
2023-11-08 16:44:39.517531: W tensorflow/c/c_api.cc:305] Operation '{name:'global_step' id:357 op device:{requested: '/device:CPU:0', assigned: ''} def:{{{node global_step}} = VarHandleOp[_class=["loc:@global_step"], _has_manual_control_dependencies=true, allowed_devices=[], container="", dtype=DT_INT64, shape=[], shared_name="global_step", _device="/device:CPU:0"]()}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:35,p2c_fd:36,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 2130824
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-08 16:44:43.478818: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 16:44:43.478967: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 16:44:43.479063: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 2130824] Python module load
RETURNN SprintControl[pid 2130824] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7f58637e2e80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, config={'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 2130824] PythonControl create {'c2p_fd': 35, 'p2c_fd': 36, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, 'config': {'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f58637e2e80>}
RETURNN SprintControl[pid 2130824] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, 'config': {'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f58637e2e80>}
RETURNN SprintControl[pid 2130824] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f58637e2e80>, 'config': {'c2p_fd': '35', 'p2c_fd': '36', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 2130824] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7f58637e2e80>, {}
RETURNN SprintControl[pid 2130824] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
SprintSubprocessInstance: exec ['/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:36,p2c_fd:38,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=yes', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 2130845
/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: Relink `/usr/local/lib/python3.11/dist-packages/tensorflow/libtensorflow_framework.so.2' with `/lib/x86_64-linux-gnu/libz.so.1' for IFUNC symbol `crc32_z'
2023-11-08 16:44:44.788087: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-08 16:44:44.788217: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-08 16:44:44.788276: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
RETURNN SprintControl[pid 2130845] Python module load
RETURNN SprintControl[pid 2130845] init: name='Sprint.PythonControl', sprint_unit='NnTrainer.pythonControl', version_number=5, callback=<built-in method callback of PyCapsule object at 0x7f6940b4ee80>, ref=<capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, config={'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, kwargs={}
RETURNN SprintControl[pid 2130845] PythonControl create {'c2p_fd': 36, 'p2c_fd': 38, 'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, 'config': {'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f6940b4ee80>}
RETURNN SprintControl[pid 2130845] PythonControl init {'name': 'Sprint.PythonControl', 'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, 'config': {'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}, 'sprint_unit': 'NnTrainer.pythonControl', 'version_number': 5, 'min_version_number': 4, 'callback': <built-in method callback of PyCapsule object at 0x7f6940b4ee80>}
RETURNN SprintControl[pid 2130845] init for Sprint.PythonControl {'reference': <capsule object "Sprint.PythonControl.Internal" at 0x7f6940b4ee80>, 'config': {'c2p_fd': '36', 'p2c_fd': '38', 'minPythonControlVersion': '4'}}
RETURNN SprintControl[pid 2130845] PythonControl run_control_loop: <built-in method callback of PyCapsule object at 0x7f6940b4ee80>, {}
RETURNN SprintControl[pid 2130845] PythonControl run_control_loop control: '<version>RWTH ASR 0.9beta (431c74d54b895a2a4c3689bcd5bf641a878bb925)\n</version>'
2023-11-08 16:45:03.663421: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:442] Loaded cuDNN version 8600
Fatal Python error: Segmentation fault
Current thread 0x00007f69453ea380 (most recent call first):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 499 in _handle_cmd_export_allophone_state_fsa_by_segment_name
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 509 in _handle_cmd
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 524 in handle_next
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 550 in run_control_loop
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector (total: 37)
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
Segmentation fault
Creating stack trace (innermost first):
#2 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
#3 /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f69477749fc]
#4 /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f6947720476]
#5 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
#6 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl13TrimAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a) [0x55d2626e440a]
#7 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl14CacheAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a2) [0x55d2626f3c72]
#8 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fb257) [0x55d262675257]
#9 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fe9ac) [0x55d2626789ac]
#10 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Am15TransitionModel5applyEN4Core3RefIKN3Fsa9AutomatonEEEib+0x274) [0x55d262671194]
#11 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Am24ClassicTransducerBuilder20applyTransitionModelEN4Core3RefIKN3Fsa9AutomatonEEE+0x387) [0x55d262660df7]
#12 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x123) [0x55d262482e43]
#13 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x53) [0x55d262483183]
#14 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder15buildTransducerEN4Core3RefIKN3Fsa9AutomatonEEE+0x8f) [0x55d262485cbf]
#15 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder15buildTransducerERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x66) [0x55d262480516]
#16 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder5buildERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2e) [0x55d262480d5e]
#17 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Nn25AllophoneStateFsaExporter23exportFsaForOrthographyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x54) [0x55d262359054]
#18 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal32exportAllophoneStateFsaBySegNameEP7_objectS3_+0x133) [0x55d26233e833]
#19 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal8callbackEP7_objectS3_+0x25d) [0x55d26233ee6d]
#20 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1cd073) [0x7f697baa0073]
#21 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x87) [0x7f697ba50ff7]
#22 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x477a) [0x7f697b9de96a]
#23 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
#24 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
#25 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
#26 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
#27 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
#28 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
#29 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
#30 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1810d8) [0x7f697ba540d8]
#31 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_Call+0x128) [0x7f697ba53b88]
#32 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Python8PyCallKwEP7_objectPKcS3_z+0xe6) [0x55d26258c876]
#33 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl16run_control_loopEv+0x5f) [0x55d262332fbf]
#34 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer13pythonControlEv+0x167) [0x55d2620df317]
#35 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer4mainERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x303) [0x55d2620b8e13]
#36 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x23) [0x55d26211e413]
#37 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application4mainEiPPc+0x577) [0x55d2620ba577]
#38 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55d2620b852d]
#39 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6947707d90]
#40 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f6947707e40]
#41 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x25) [0x55d2620dd7a5]
Exception in py_wrap_get_sprint_automata_for_batch:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in get_sprint_automata_for_batch_op.<locals>.py_wrap_get_sprint_automata_for_batch
line: return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
locals:
py_get_sprint_automata_for_batch = <global> <function py_get_sprint_automata_for_batch at 0x7ff04b0351c0>
sprint_opts = <local> {'sprintExecPath': '/work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-channel --*.version....
tags = <not found>
py_tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
b'switchboard-1/sw02...
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
line: edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
locals:
edges = <not found>
weights = <not found>
start_end_states = <not found>
sprint_instance_pool = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7ff04c59c1d0>
sprint_instance_pool.get_automata_for_batch = <local> <bound method SprintInstancePool.get_automata_for_batch of <returnn.sprint.error_signals.SprintInstancePool object at 0x7ff04c59c1d0>>
tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
b'switchboard-1/sw02...
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in SprintInstancePool.get_automata_for_batch
line: r = instance._read()
locals:
r = <local> ('ok', 9, 22, array([ 1, 2, 3, 4, 5, 6, 7, 0, 1, 2, 3, 4, 5, 6, 0, 2, 4,
6, 7, 5, 6, 4, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5,
6, 7, 2, 4, 6, 8, 8, 8, 8, 8, 0, 6, 0, 22, 0, 48, 0,
0, 6, 0, 22, 0, 48, 0, 6, 22, 48, 48, 0, 48,...
instance = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7ff102701d10>
instance._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7ff102701d10>>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
line: return util.read_pickled_object(p)
locals:
util = <global> <module 'returnn.util.basic' from '/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py'>
util.read_pickled_object = <global> <function read_pickled_object at 0x7ff17f482b60>
p = <local> <_io.FileIO name=35 mode='rb' closefd=True>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
locals:
size_raw = <not found>
read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7ff17f482ac0>
p = <local> <_io.FileIO name=35 mode='rb' closefd=True>
getvalue = <not found>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
locals:
EOFError = <builtin> <class 'EOFError'>
size = <local> 4
read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
2023-11-08 16:45:06.805151: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
2023-11-08 16:45:06.805314: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4669204044388377120
2023-11-08 16:45:06.805394: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 14394728958513161507
2023-11-08 16:45:06.805423: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4611900397994247129
2023-11-08 16:45:06.805450: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11246935140361182411
2023-11-08 16:45:06.805476: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 3527483492372743068
2023-11-08 16:45:06.805500: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 455321662105441778
2023-11-08 16:45:06.805527: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 4997316685218163964
2023-11-08 16:45:06.805550: I tensorflow/core/framework/local_rendezvous.cc:421] Local rendezvous recv item cancelled. Key hash: 11970666840078253952
TensorFlow exception: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def
Exception UnknownError() in step 0. (pid 2130233)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
EXCEPTION
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
line: return fn(*args)
locals:
fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.00...
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
fetch_list, target_list,
run_metadata)
locals:
tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7ff14a9916e0>
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7ff052c1fb30>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
run_metadata = <local> None
UnknownError: 2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict, options=run_options
) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
locals:
fetches_results = <not found>
sess = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options = <not found>
run_options = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
line: result = self._run(None, fetches, feed_dict, options_ptr,
run_metadata_ptr)
locals:
result = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options_ptr = <local> None
run_metadata_ptr = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
line: results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
locals:
results = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
handle = <local> None
final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049...
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
line: return self._do_call(_run_fn, feeds, fetches, targets, options,
run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
_run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
line: raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
locals:
type = <builtin> <class 'type'>
e = <not found>
node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
op: "PyFunc"
input: "extern_data/placeholders/seq_tag/seq_tag"
attr {
key: "token"
value {
s: "pyfunc_0"
}
}
attr {
key: "Tout"
value {
list {
type: DT_INT32
type: DT_FLOAT
type: DT_INT...
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>\n File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4341, in help_on_tf_exception
line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph(
debug_fetch,
target_op=op,
fetch_helper_tensors=list(op.inputs),
stop_at_ts=stop_at_ts,
verbose_stream=file,
)
locals:
debug_fetch = <local> <tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>
fetch_helpers = <not found>
op_copied = <not found>
FetchHelper = <local> <class 'returnn.tf.util.basic.FetchHelper'>
FetchHelper.copy_graph = <local> <bound method FetchHelper.copy_graph of <class 'returnn.tf.util.basic.FetchHelper'>>
target_op = <not found>
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
fetch_helper_tensors = <not found>
list = <builtin> <class 'list'>
op.inputs = <local> (<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>,)
stop_at_ts = <local> [<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>, <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, <tf.Tensor 'extern_data/placeholders/batch_dim:...
verbose_stream = <not found>
file = <local> <returnn.log.Stream object at 0x7ff1800af490>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph
line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops))
locals:
target_op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
ops = <local> [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
pformat = <local> <function pformat at 0x7ff183bc9e40>
AssertionError: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
Step meta information:
{'seq_idx': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38],
'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760 6246 6372 6861 7296 7499 7534 7622 7824 8031 8295 8431
8690 8675 8667 8886 9084 9199 9163 9156 9274 9262 9540 9668
9678 9719 9711 9902 9989 10010 10020 10073 10006 10102 10131 10112
10130 10178 10208])
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
EXCEPTION
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1402, in BaseSession._do_call
line: return fn(*args)
locals:
fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.00...
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1385, in BaseSession._do_run.<locals>._run_fn
line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1478, in BaseSession._call_tf_sessionrun
line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
fetch_list, target_list,
run_metadata)
locals:
tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7ff14a9916e0>
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7ff052c1fb30>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
run_metadata = <local> None
UnknownError: 2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict, options=run_options
) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
locals:
fetches_results = <not found>
sess = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options = <not found>
run_options = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 972, in BaseSession.run
line: result = self._run(None, fetches, feed_dict, options_ptr,
run_metadata_ptr)
locals:
result = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 7
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options_ptr = <local> None
run_metadata_ptr = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1215, in BaseSession._run
line: results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
locals:
results = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
handle = <local> None
final_targets = <local> [<tf.Operation 'optim_and_step_incr' type=NoOp>]
final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049...
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1395, in BaseSession._do_run
line: return self._do_call(_run_fn, feeds, fetches, targets, options,
run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>
self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7ff10078c0d0>>
_run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7ff04bb38860>
feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff052ae50f0>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051676ef0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff051674e70>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7ff05177f870>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7ff04c6b2770>]
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/client/session.py", line 1421, in BaseSession._do_call
line: raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
locals:
type = <builtin> <class 'type'>
e = <not found>
node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
op: "PyFunc"
input: "extern_data/placeholders/seq_tag/seq_tag"
attr {
key: "token"
value {
s: "pyfunc_0"
}
}
attr {
key: "Tout"
value {
list {
type: DT_INT32
type: DT_FLOAT
type: DT_INT...
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>\n File "/work/asr4/vieting/tmp/20231108_tf2..., len = 8772
UnknownError: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_127]]
(1) UNKNOWN: EOFError: expected to read 4 bytes but got EOF after 0 bytes
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 270, in __call__
ret = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 528, in get_automata_for_batch
r = instance._read()
^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/./returnn/rnn.py", line 11, in <module>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/deprecation.py", line 383, in new_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/util/dispatch.py", line 1260, in op_dispatch_handler
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 798, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 773, in py_func_common
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/script_ops.py", line 380, in _internal_py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/op_def_library.py", line 796, in _apply_op_helper
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 2657, in _create_op_internal
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/framework/ops.py", line 1161, in from_node_def
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 2130233)
SprintSubprocessInstance: interrupt child proc 2130824
The RASR log of the nn trainer does not contain anything that looks particularly suspicious to me.
What about this?
configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
And in your stdout, you see the actual error:
Fatal Python error: Segmentation fault
Current thread 0x00007f69453ea380 (most recent call first):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 499 in _handle_cmd_export_allophone_state_fsa_by_segment_name
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 509 in _handle_cmd
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 524 in handle_next
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/control.py", line 550 in run_control_loop
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector (total: 37)
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
<?xml version="1.0" encoding="UTF-8"?>
<sprint>
PROGRAM DEFECTIVE (TERMINATED BY SIGNAL):
Segmentation fault
Creating stack trace (innermost first):
#2 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
#3 /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c) [0x7f69477749fc]
#4 /lib/x86_64-linux-gnu/libc.so.6(raise+0x16) [0x7f6947720476]
#5 /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f6947720520]
#6 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl13TrimAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a) [0x55d2626e440a]
#7 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK3Ftl14CacheAutomatonIN3Fsa9AutomatonEE8getStateEj+0x3a2) [0x55d2626f3c72]
#8 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fb257) [0x55d262675257]
#9 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(+0x9fe9ac) [0x55d2626789ac]
#10 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Am15TransitionModel5applyEN4Core3RefIKN3Fsa9AutomatonEEEib+0x274) [0x55d262671194]
#11 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Am24ClassicTransducerBuilder20applyTransitionModelEN4Core3RefIKN3Fsa9AutomatonEEE+0x387) [0x55d262660df7]
#12 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x123) [0x55d262482e43]
#13 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder17addLoopTransitionEN4Core3RefIKN3Fsa9AutomatonEEE+0x53) [0x55d262483183]
#14 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech23CTCTopologyGraphBuilder15buildTransducerEN4Core3RefIKN3Fsa9AutomatonEEE+0x8f) [0x55d262485cbf]
#15 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder15buildTransducerERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x66) [0x55d262480516]
#16 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Speech26AllophoneStateGraphBuilder5buildERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2e) [0x55d262480d5e]
#17 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZNK2Nn25AllophoneStateFsaExporter23exportFsaForOrthographyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x54) [0x55d262359054]
#18 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal32exportAllophoneStateFsaBySegNameEP7_objectS3_+0x133) [0x55d26233e833]
#19 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl8Internal8callbackEP7_objectS3_+0x25d) [0x55d26233ee6d]
#20 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1cd073) [0x7f697baa0073]
#21 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_MakeTpCall+0x87) [0x7f697ba50ff7]
#22 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x477a) [0x7f697b9de96a]
#23 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
#24 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
#25 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
#26 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
#27 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x181058) [0x7f697ba54058]
#28 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyEval_EvalFrameDefault+0x50ae) [0x7f697b9df29e]
#29 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x26bf9a) [0x7f697bb3ef9a]
#30 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(+0x1810d8) [0x7f697ba540d8]
#31 /lib/x86_64-linux-gnu/libpython3.11.so.1.0(_PyObject_Call+0x128) [0x7f697ba53b88]
#32 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN6Python8PyCallKwEP7_objectPKcS3_z+0xe6) [0x55d26258c876]
#33 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN2Nn13PythonControl16run_control_loopEv+0x5f) [0x55d262332fbf]
#34 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer13pythonControlEv+0x167) [0x55d2620df317]
#35 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN9NnTrainer4mainERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x303) [0x55d2620b8e13]
#36 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x23) [0x55d26211e413]
#37 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_ZN4Core11Application4mainEiPPc+0x577) [0x55d2620ba577]
#38 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(main+0x3d) [0x55d2620b852d]
#39 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6947707d90]
#40 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f6947707e40]
#41 /work/asr4/hilmes/dev/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard(_start+0x25) [0x55d2620dd7a5]
What about this? configuration error: failed to open file "neural-network-trainer.config" for reading. (No such file or directory)
I just use "sprint_opts" with "sprintConfigStr" for the fast_bw loss. Not sure why this "neural-network-trainer.config" is also checked. I do not define this anywhere in my config.
Note that the segmentation fault only occurs with the tf2.14 image and RASR. There might be something wrong on that side as well, see .
With my previous settings (tf2.13, RASR compiled with tf2.8), this is stdout + stderr
vieting@cn-251:/work/asr4/vieting/tmp/20231108_tf213_sprint_op$ ./run_example_patch.sh
RETURNN starting up, version 1.20231108.140626+git.9fe93590, date/time 2023-11-08-17-07-35 (UTC+0100), pid 2131331, cwd /work/asr4/vieting/tmp/20231108_tf213_sprint_op, Python /usr/bin/python3
RETURNN command line options: ['returnn.config']
Hostname: cn-251
MEMORY: main proc python3(2131331) initial: rss=40.9MB pss=40.9MB uss=40.9MB shared=4.0KB
MEMORY: total (1 procs): pss=40.9MB uss=40.9MB
2023-11-08 17:07:41.035240: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
MEMORY: main proc python3(2131331) increased RSS: rss=212.4MB pss=212.4MB uss=212.4MB shared=4.0KB
MEMORY: total (1 procs): pss=212.4MB uss=212.4MB
MEMORY: main proc python3(2131331) increased RSS: rss=283.6MB pss=283.6MB uss=283.6MB shared=4.0KB
MEMORY: total (1 procs): pss=283.6MB uss=283.6MB
MEMORY: main proc python3(2131331) increased RSS: rss=420.4MB pss=419.8MB uss=419.4MB shared=0.9MB
MEMORY: total (1 procs): pss=419.8MB uss=419.4MB
/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:2258: SyntaxWarning: "is not" with a literal. Did you mean "!="?
if dim is not 1:
/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:6254: SyntaxWarning: "is" with a literal. Did you mean "=="?
if start is 0 and stop is None:
TensorFlow: 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) (<not-under-git> in /usr/local/lib/python3.8/dist-packages/tensorflow)
Use num_threads=1 (but min 2) via OMP_NUM_THREADS.
Setup TF inter and intra global thread pools, num_threads 2, session opts {'log_device_placement': False, 'device_count': {'GPU': 0}, 'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2}.
CUDA_VISIBLE_DEVICES is set to '2'.
Collecting TensorFlow device list...
2023-11-08 17:08:04.048461: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /device:GPU:0 with 10396 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
Local devices available to TensorFlow:
1/2: name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12364557139125826212
xla_global_id: -1
2/2: name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 10901061632
locality {
bus_id: 2
numa_node: 1
links {
}
}
incarnation: 14856658680689284311
physical_device_desc: "device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1"
xla_global_id: 416903419
Using gpu device 2: NVIDIA GeForce GTX 1080 Ti
Hostname 'cn-251', GPU 2, GPU-dev-name 'NVIDIA GeForce GTX 1080 Ti', GPU-memory 10.2GB
MEMORY: main proc python3(2131331) increased RSS: rss=1.1GB pss=1.0GB uss=1.0GB shared=5.5MB
MEMORY: total (1 procs): pss=1.0GB uss=1.0GB
Train data:
input: 1 x 1
output: {'raw': {'dtype': 'string', 'shape': ()}, 'orth': [256, 1], 'data': [1, 2]}
OggZipDataset, sequences: 249229, frames: unknown
Dev data:
MEMORY: main proc python3(2131331) increased RSS: rss=1.7GB pss=1.7GB uss=1.7GB shared=5.5MB
MEMORY: total (1 procs): pss=1.7GB uss=1.7GB
OggZipDataset, sequences: 300, frames: unknown
Learning-rate-control: file learning_rates.swb.ctc does not exist yet
Setup TF session with options {'log_device_placement': False, 'device_count': {'GPU': 1}} ...
2023-11-08 17:08:13.177173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'data': [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)] float32
layer /features/'conv_h_filter': ['conv_h_filter:static:0'(128),'conv_h_filter:static:1'(1),F|F'conv_h_filter:static:2'(150)] float32
layer /features/'conv_h': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_act': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F|F'conv_h:channel'(150)] float32
layer /features/'conv_h_split': [B,T|'⌈((-63+time:var:extern_data:data)+-64)/5⌉'[B],F'conv_h:channel'(150),F|F'conv_h_split_split_dims1'(1)] float32
DEPRECATION WARNING: Explicitly specify in_spatial_dims when there is more than one spatial dim in the input.
This will be disallowed with behavior_version 8.
layer /features/'conv_l': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel'(150),F|F'conv_l:channel'(5)] float32
layer /features/'conv_l_merge': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
DEPRECATION WARNING: MergeDimsLayer, only keep_order=True is allowed
This will be disallowed with behavior_version 6.
layer /features/'conv_l_act_no_norm': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'conv_l_act': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /features/'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'features': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
layer /'specaug': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F|F'conv_h:channel*conv_l:channel'(750)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py:2462: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conv_source': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_source_split_dims1'(1)] float32
layer /'conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],F'conv_h:channel*conv_l:channel'(750),F|F'conv_1:channel'(32)] float32
layer /'conv_1_pool': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/16⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_1:channel'(32)] float32
layer /'conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/32⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_2:channel'(64)] float32
layer /'conv_3': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'conv_h:channel*conv_l:channel//2'(375),F|F'conv_3:channel'(64)] float32
layer /'conv_merged': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conv_h:channel*conv_l:channel//2)*conv_3:channel'(24000)] float32
layer /'input_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'input_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'input_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_1_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_1_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_1_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_1': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_pointwise_conv_1:feature-dense'(1024)] float32
layer /'conformer_1_conv_mod_glu': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'(conformer_1_conv_mod_pointwise_conv_1:feature-dense)//2'(512)] float32
layer /'conformer_1_conv_mod_depthwise_conv': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_bn': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
DEPRECATION WARNING: batch_norm masked_time should be specified explicitly
This will be disallowed with behavior_version 12.
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py:1725: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
layer /'conformer_1_conv_mod_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_pointwise_conv_2': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_conv_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_conv_mod_depthwise_conv:channel'(512)] float32
layer /'conformer_1_mhsa_mod_relpos_encoding': [T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_relpos_encoding_rel_pos_enc_feat'(64)] float32
layer /'conformer_1_mhsa_mod_self_attention': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_att_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_mhsa_mod_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_ln': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_mhsa_mod_self_attention_self_att_feat'(512)] float32
layer /'conformer_1_ffmod_2_linear_swish': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_linear_swish:feature-dense'(2048)] float32
layer /'conformer_1_ffmod_2_dropout_linear': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_dropout': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_ffmod_2_half_res_add': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'conformer_1_output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
layer /'encoder': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'conformer_1_ffmod_2_dropout_linear:feature-dense'(512)] float32
2023-11-08 17:08:14.118488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10396 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1080 Ti, pci bus id: 0000:81:00.0, compute capability: 6.1
layer /'output': [B,T|'⌈((-19+(⌈((-63+time:var:extern_data:data)+-64)/5⌉))+-20)/64⌉'[B],F|F'output:feature-dense'(88)] float32
WARNING:tensorflow:From /work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py:54: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
MEMORY: main proc python3(2131331) increased RSS: rss=1.9GB pss=1.9GB uss=1.8GB shared=31.8MB
MEMORY: total (1 procs): pss=1.9GB uss=1.8GB
OpCodeCompiler call: /usr/local/cuda-11.8/bin/nvcc -shared -O2 -std=c++17 -I /usr/local/lib/python3.8/dist-packages/tensorflow/include -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/external/nsync/public -ccbin /usr/bin/gcc -I /usr/local/cuda-11.8/targets/x86_64-linux/include -I /usr/local/cuda-11.8/include -L /usr/local/cuda-11.8/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -arch compute_61 -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/third_party/gpus/cuda/include -D_GLIBCXX_USE_CXX11_ABI=1 -DNDEBUG=1 -g /var/tmp/vieting/returnn_tf_cache/ops/FastBaumWelchOp/b50a371e1a/FastBaumWelchOp.cc -o /var/tmp/vieting/returnn_tf_cache/ops/FastBaumWelchOp/b50a371e1a/FastBaumWelchOp.so -L/usr/local/lib/python3.8/dist-packages/scipy.libs -l:libopenblasp-r0-41284840.3.18.so -L/usr/local/lib/python3.8/dist-packages/tensorflow -l:libtensorflow_framework.so.2
MEMORY: sub proc nvcc(2131947) initial: rss=3.4MB pss=2.0MB uss=0.9MB shared=2.5MB
MEMORY: total (2 procs): pss=1.9GB uss=1.8GB
MEMORY: sub proc nvcc(2131947) increased RSS: rss=3.5MB pss=2.1MB uss=1.5MB shared=2.0MB
MEMORY: sub proc sh(2131954) initial: rss=1.6MB pss=603.0KB uss=236.0KB shared=1.3MB
MEMORY: sub proc cicc(2131955) initial: rss=257.3MB pss=255.3MB uss=254.2MB shared=3.0MB
MEMORY: total (4 procs): pss=2.1GB uss=2.1GB
MEMORY: sub proc cicc(2131955) increased RSS: rss=1.0GB pss=1.0GB uss=1.0GB shared=3.0MB
MEMORY: total (4 procs): pss=2.9GB uss=2.9GB
MEMORY: proc <unknown-dead>(2131954) exited, old: rss=1.6MB pss=603.0KB uss=236.0KB shared=1.3MB
MEMORY: proc cicc(2131955) exited, old: rss=1.0GB pss=1.0GB uss=1.0GB shared=3.0MB
MEMORY: sub proc sh(2131963) initial: rss=1.6MB pss=605.0KB uss=228.0KB shared=1.4MB
MEMORY: sub proc cudafe++(2131964) initial: rss=229.5MB pss=228.3MB uss=227.8MB shared=1.7MB
MEMORY: total (4 procs): pss=2.1GB uss=2.1GB
MEMORY: sub proc cudafe++(2131964) increased RSS: rss=1.1GB pss=1.1GB uss=1.1GB shared=1.7MB
MEMORY: total (4 procs): pss=3.0GB uss=2.9GB
MEMORY: proc <unknown-dead>(2131963) exited, old: rss=1.6MB pss=605.0KB uss=228.0KB shared=1.4MB
MEMORY: proc cudafe++(2131964) exited, old: rss=1.1GB pss=1.1GB uss=1.1GB shared=1.7MB
MEMORY: sub proc nvcc(2131947) increased RSS: rss=3.6MB pss=2.1MB uss=1.5MB shared=2.0MB
MEMORY: sub proc sh(2131969) initial: rss=1.7MB pss=552.0KB uss=224.0KB shared=1.5MB
MEMORY: sub proc gcc(2131970) initial: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: sub proc cc1plus(2131971) initial: rss=397.0MB pss=395.4MB uss=394.8MB shared=2.2MB
MEMORY: total (5 procs): pss=2.2GB uss=2.2GB
MEMORY: sub proc cc1plus(2131971) increased RSS: rss=0.8GB pss=0.8GB uss=0.8GB shared=2.2MB
MEMORY: total (5 procs): pss=2.7GB uss=2.7GB
Network layer topology:
extern data: data: Tensor{[B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}, seq_tag: Tensor{[B?], dtype='string'}
used data keys: ['data', 'seq_tag']
layers:
layer batch_norm 'conformer_1_conv_mod_bn' #: 512
layer conv 'conformer_1_conv_mod_depthwise_conv' #: 512
layer copy 'conformer_1_conv_mod_dropout' #: 512
layer gating 'conformer_1_conv_mod_glu' #: 512
layer layer_norm 'conformer_1_conv_mod_ln' #: 512
layer linear 'conformer_1_conv_mod_pointwise_conv_1' #: 1024
layer linear 'conformer_1_conv_mod_pointwise_conv_2' #: 512
layer combine 'conformer_1_conv_mod_res_add' #: 512
layer activation 'conformer_1_conv_mod_swish' #: 512
layer copy 'conformer_1_ffmod_1_dropout' #: 512
layer linear 'conformer_1_ffmod_1_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_1_half_res_add' #: 512
layer linear 'conformer_1_ffmod_1_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_1_ln' #: 512
layer copy 'conformer_1_ffmod_2_dropout' #: 512
layer linear 'conformer_1_ffmod_2_dropout_linear' #: 512
layer eval 'conformer_1_ffmod_2_half_res_add' #: 512
layer linear 'conformer_1_ffmod_2_linear_swish' #: 2048
layer layer_norm 'conformer_1_ffmod_2_ln' #: 512
layer linear 'conformer_1_mhsa_mod_att_linear' #: 512
layer copy 'conformer_1_mhsa_mod_dropout' #: 512
layer layer_norm 'conformer_1_mhsa_mod_ln' #: 512
layer relative_positional_encoding 'conformer_1_mhsa_mod_relpos_encoding' #: 64
layer combine 'conformer_1_mhsa_mod_res_add' #: 512
layer self_attention 'conformer_1_mhsa_mod_self_attention' #: 512
layer layer_norm 'conformer_1_output' #: 512
layer conv 'conv_1' #: 32
layer pool 'conv_1_pool' #: 32
layer conv 'conv_2' #: 64
layer conv 'conv_3' #: 64
layer merge_dims 'conv_merged' #: 24000
layer split_dims 'conv_source' #: 1
layer source 'data' #: 1
layer copy 'encoder' #: 512
layer subnetwork 'features' #: 750
layer conv 'features/conv_h' #: 150
layer eval 'features/conv_h_act' #: 150
layer variable 'features/conv_h_filter' #: 150
layer split_dims 'features/conv_h_split' #: 1
layer conv 'features/conv_l' #: 5
layer layer_norm 'features/conv_l_act' #: 750
layer eval 'features/conv_l_act_no_norm' #: 750
layer merge_dims 'features/conv_l_merge' #: 750
layer copy 'features/output' #: 750
layer copy 'input_dropout' #: 512
layer linear 'input_linear' #: 512
layer softmax 'output' #: 88
layer eval 'specaug' #: 750
net params #: 18473980
net trainable params: [<tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_beta:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_bn/batch_norm/conformer_1_conv_mod_bn_conformer_1_conv_mod_bn_output_gamma:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/W:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_depthwise_conv/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/W:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_1/b:0' shape=(1024,) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_conv_mod_pointwise_conv_2/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_1_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/W:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_dropout_linear/b:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/W:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_linear_swish/b:0' shape=(2048,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_ffmod_2_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_att_linear/W:0' shape=(512, 512) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_ln/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_relpos_encoding/encoding_matrix:0' shape=(65, 64) dtype=float32>, <tf.Variable 'conformer_1_mhsa_mod_self_attention/QKV:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'conformer_1_output/bias:0' shape=(512,) dtype=float32>, <tf.Variable 'conformer_1_output/scale:0' shape=(512,) dtype=float32>, <tf.Variable 'conv_1/W:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'conv_1/bias:0' shape=(32,) dtype=float32>, <tf.Variable 'conv_2/W:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'conv_2/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'conv_3/W:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'conv_3/bias:0' shape=(64,) dtype=float32>, <tf.Variable 'features/conv_h_filter/conv_h_filter:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'features/conv_l/W:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'features/conv_l_act/bias:0' shape=(750,) dtype=float32>, <tf.Variable 'features/conv_l_act/scale:0' shape=(750,) dtype=float32>, <tf.Variable 'input_linear/W:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'output/W:0' shape=(512, 88) dtype=float32>, <tf.Variable 'output/b:0' shape=(88,) dtype=float32>]
2023-11-08 17:09:01.409733: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
start training at epoch 1
using batch size: {'classes': 5000, 'data': 400000}, max seqs: 128
learning rate control: NewbobMultiEpoch(num_epochs=6, update_interval=1, relative_error_threshold=-0.01, relative_error_grow_threshold=-0.01), epoch data: 1: EpochData(learningRate=1.325e-05, error={}), 2: EpochData(learningRate=1.539861111111111e-05, error={}), 3: EpochData(learningRate=1.754722222222222e-05, error={}), ..., 360: EpochData(learningRate=1.4333333333333375e-05, error={}), 361: EpochData(learningRate=1.2166666666666727e-05, error={}), 362: EpochData(learningRate=1e-05, error={}), error key: None
pretrain: None
MEMORY: proc <unknown-dead>(2131947) exited, old: rss=3.6MB pss=2.1MB uss=1.5MB shared=2.0MB
MEMORY: proc <unknown-dead>(2131969) exited, old: rss=1.7MB pss=552.0KB uss=224.0KB shared=1.5MB
MEMORY: proc <unknown-dead>(2131970) exited, old: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: proc cc1plus(2131971) exited, old: rss=0.8GB pss=0.8GB uss=0.8GB shared=2.2MB
MEMORY: main proc python3(2131331) increased RSS: rss=2.3GB pss=2.3GB uss=2.3GB shared=6.4MB
MEMORY: total (1 procs): pss=2.3GB uss=2.3GB
start epoch 1 with learning rate 1.325e-05 ...
TF: log_dir: output/models/train-2023-11-08-16-07-34
Create optimizer <class 'returnn.tf.updater.NadamOptimizer'> with options {'epsilon': 1e-08, 'learning_rate': <tf.Variable 'learning_rate:0' shape=() dtype=float32>}.
Initialize optimizer (default) with slots ['m', 'v'].
These additional variable were created by the optimizer: [<tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_bn/batch_norm/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(1, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32, 1, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_depthwise_conv/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 1024) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_1/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(1024,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_conv_mod_pointwise_conv_2/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_1_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_dropout_linear/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 2048) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_linear_swish/b_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(2048,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_ffmod_2_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_att_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(512, 512) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_ln/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_relpos_encoding/Gather_grad/Reshape_accum_grad/var_accum_grad:0' shape=(65, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_mhsa_mod_self_attention/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 1536) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conformer_1_output/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 1, 32) dtype=float32>, <tf.Variable 'optimize/gradients/conv_1/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(32,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 32, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_2/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(3, 3, 64, 64) dtype=float32>, <tf.Variable 'optimize/gradients/conv_3/bias_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(64,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_h/convolution/ExpandDims_1_grad/Reshape_accum_grad/var_accum_grad:0' shape=(128, 1, 150) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l/convolution_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(40, 1, 1, 5) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/add_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/features/conv_l_act/mul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(750,) dtype=float32>, <tf.Variable 'optimize/gradients/input_linear/W_gradient_sum/AddN_accum_grad/var_accum_grad:0' shape=(24000, 512) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/dot/MatMul_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(512, 88) dtype=float32>, <tf.Variable 'optimize/gradients/output/linear/add_bias_grad/tuple/control_dependency_1_accum_grad/var_accum_grad:0' shape=(88,) dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta1_power:0' shape=() dtype=float32>, <tf.Variable 'optimize/apply_grads/accum_grad_multiple_step/beta2_power:0' shape=() dtype=float32>].
2023-11-08 17:09:08.816918: W tensorflow/c/c_api.cc:304] Operation '{name:'global_step' id:161 op device:{requested: '/device:CPU:0', assigned: ''} def:{{{node global_step}} = VarHandleOp[_class=["loc:@global_step"], _has_manual_control_dependencies=true, allowed_devices=[], container="", dtype=DT_INT64, shape=[], shared_name="global_step", _device="/device:CPU:0"]()}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
OpCodeCompiler call: /usr/local/cuda-11.8/bin/nvcc -shared -O2 -std=c++17 -I /usr/local/lib/python3.8/dist-packages/tensorflow/include -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/external/nsync/public -ccbin /usr/bin/gcc -I /usr/local/cuda-11.8/targets/x86_64-linux/include -I /usr/local/cuda-11.8/include -L /usr/local/cuda-11.8/lib64 -x cu -v -DGOOGLE_CUDA=1 -Xcompiler -fPIC -Xcompiler -v -I /usr/local/lib/python3.8/dist-packages/tensorflow/include/third_party/gpus/cuda/include -D_GLIBCXX_USE_CXX11_ABI=1 -DNDEBUG=1 -g /var/tmp/vieting/returnn_tf_cache/ops/DevMaxBytesInUse/5fd1f0202b/DevMaxBytesInUse.cc -o /var/tmp/vieting/returnn_tf_cache/ops/DevMaxBytesInUse/5fd1f0202b/DevMaxBytesInUse.so -L/usr/local/lib/python3.8/dist-packages/tensorflow -l:libtensorflow_framework.so.2
MEMORY: main proc python3(2131331) increased RSS: rss=2.6GB pss=2.6GB uss=2.5GB shared=8.8MB
MEMORY: sub proc nvcc(2131988) initial: rss=3.5MB pss=2.0MB uss=1.5MB shared=2.1MB
MEMORY: sub proc sh(2131991) initial: rss=1.6MB pss=565.0KB uss=256.0KB shared=1.4MB
MEMORY: sub proc gcc(2131992) initial: rss=2.5MB pss=1.3MB uss=1.0MB shared=1.6MB
MEMORY: sub proc cc1plus(2131993) initial: rss=43.0MB pss=41.4MB uss=40.9MB shared=2.2MB
MEMORY: total (5 procs): pss=2.6GB uss=2.6GB
MEMORY: proc sh(2131991) exited, old: rss=1.6MB pss=565.0KB uss=256.0KB shared=1.4MB
MEMORY: proc gcc(2131992) exited, old: rss=2.5MB pss=1.3MB uss=1.0MB shared=1.6MB
MEMORY: proc cc1plus(2131993) exited, old: rss=43.0MB pss=41.4MB uss=40.9MB shared=2.2MB
MEMORY: sub proc sh(2131994) initial: rss=1.7MB pss=633.0KB uss=232.0KB shared=1.5MB
MEMORY: sub proc cicc(2131995) initial: rss=736.6MB pss=734.8MB uss=733.8MB shared=2.9MB
MEMORY: total (4 procs): pss=3.3GB uss=3.3GB
MEMORY: proc sh(2131994) exited, old: rss=1.7MB pss=633.0KB uss=232.0KB shared=1.5MB
MEMORY: proc cicc(2131995) exited, old: rss=736.6MB pss=734.8MB uss=733.8MB shared=2.9MB
MEMORY: sub proc nvcc(2131988) increased RSS: rss=3.6MB pss=2.2MB uss=1.5MB shared=2.0MB
MEMORY: sub proc sh(2132005) initial: rss=1.6MB pss=613.0KB uss=232.0KB shared=1.4MB
MEMORY: sub proc cudafe++(2132006) initial: rss=242.1MB pss=241.0MB uss=240.5MB shared=1.6MB
MEMORY: total (4 procs): pss=2.8GB uss=2.8GB
MEMORY: proc sh(2132005) exited, old: rss=1.6MB pss=613.0KB uss=232.0KB shared=1.4MB
MEMORY: proc cudafe++(2132006) exited, old: rss=242.1MB pss=241.0MB uss=240.5MB shared=1.6MB
MEMORY: sub proc sh(2132007) initial: rss=1.6MB pss=531.0KB uss=224.0KB shared=1.4MB
MEMORY: sub proc gcc(2132008) initial: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: sub proc cc1plus(2132009) initial: rss=121.0MB pss=119.5MB uss=119.0MB shared=2.1MB
MEMORY: total (5 procs): pss=2.7GB uss=2.7GB
MEMORY: sub proc cc1plus(2132009) increased RSS: rss=515.9MB pss=514.4MB uss=513.9MB shared=2.1MB
MEMORY: total (5 procs): pss=3.1GB uss=3.1GB
SprintSubprocessInstance: exec ['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:35,p2c_fd:36,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']
SprintSubprocessInstance: starting, pid 2132023
MEMORY: proc <unknown-dead>(2131988) exited, old: rss=3.6MB pss=2.2MB uss=1.5MB shared=2.0MB
MEMORY: proc <unknown-dead>(2132007) exited, old: rss=1.6MB pss=531.0KB uss=224.0KB shared=1.4MB
MEMORY: proc <unknown-dead>(2132008) exited, old: rss=2.6MB pss=1.4MB uss=1.0MB shared=1.6MB
MEMORY: proc cc1plus(2132009) exited, old: rss=515.9MB pss=514.4MB uss=513.9MB shared=2.1MB
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: error while loading shared libraries: libtensorflow_cc.so.2: cannot open shared object file: No such file or directory
SprintSubprocessInstance: Sprint child process (['/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', '--*.python-control-enabled=true', '--*.pymod-path=/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn', '--*.pymod-name=returnn.sprint.control', '--*.pymod-config=c2p_fd:35,p2c_fd:36,minPythonControlVersion:4', '--*.configuration.channel=output-channel', '--*.real-time-factor.channel=output-channel', '--*.system-info.channel=output-channel', '--*.time.channel=output-channel', '--*.version.channel=output-channel', '--*.log.channel=output-channel', '--*.warning.channel=output-channel,', 'stderr', '--*.error.channel=output-channel,', 'stderr', '--*.statistics.channel=output-channel', '--*.progress.channel=output-channel', '--*.dot.channel=nil', '--*.corpus.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/datasets/switchboard/CreateSwitchboardBlissCorpusJob.Z1EMi4TdrUS6/output/swb.corpus.xml.gz', '--*.corpus.segments.file=/u/vieting/setups/swb/20230406_feat/work/i6_core/corpus/filter/FilterSegmentsByListJob.nrKcBIdsMBZm/output/segments.1', '--*.model-combination.lexicon.file=/u/vieting/setups/swb/20230406_feat/work/i6_experiments/users/berger/recipe/lexicon/modification/MakeBlankLexiconJob.N8RlHYKzilei/output/lexicon.xml', '--*.model-combination.acoustic-model.state-tying.type=lookup', '--*.model-combination.acoustic-model.state-tying.file=/u/vieting/setups/swb/20230406_feat/dependencies/state-tying_blank', '--*.model-combination.acoustic-model.allophones.add-from-lexicon=no', '--*.model-combination.acoustic-model.allophones.add-all=yes', '--*.model-combination.acoustic-model.allophones.add-from-file=/u/vieting/setups/swb/20230406_feat/dependencies/allophones_blank', '--*.model-combination.acoustic-model.hmm.states-per-phone=1', '--*.model-combination.acoustic-model.hmm.state-repetitions=1', '--*.model-combination.acoustic-model.hmm.across-word-model=yes', '--*.model-combination.acoustic-model.hmm.early-recombination=no', '--*.model-combination.acoustic-model.tdp.scale=1.0', '--*.model-combination.acoustic-model.tdp.*.loop=0.0', '--*.model-combination.acoustic-model.tdp.*.forward=0.0', '--*.model-combination.acoustic-model.tdp.*.skip=infinity', '--*.model-combination.acoustic-model.tdp.*.exit=0.0', '--*.model-combination.acoustic-model.tdp.silence.loop=0.0', '--*.model-combination.acoustic-model.tdp.silence.forward=0.0', '--*.model-combination.acoustic-model.tdp.silence.skip=infinity', '--*.model-combination.acoustic-model.tdp.silence.exit=0.0', '--*.model-combination.acoustic-model.tdp.entry-m1.loop=infinity', '--*.model-combination.acoustic-model.tdp.entry-m2.loop=infinity', '--*.model-combination.acoustic-model.phonology.history-length=0', '--*.model-combination.acoustic-model.phonology.future-length=0', '--*.transducer-builder-filter-out-invalid-allophones=yes', '--*.fix-allophone-context-at-word-boundaries=yes', '--*.allophone-state-graph-builder.topology=ctc', '--*.allow-for-silence-repetitions=no', '--action=python-control', '--python-control-loop-type=python-control-loop', '--extract-features=no', '--*.encoding=UTF-8', '--*.output-channel.file=$(LOGFILE)', '--*.output-channel.compressed=no', '--*.output-channel.append=no', '--*.output-channel.unbuffered=no', '--*.LOGFILE=nn-trainer.loss.log', '--*.TASK=1']) caused an exception.
MEMORY: total (1 procs): pss=2.6GB uss=2.5GB
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in SprintSubprocessInstance._start_child
line: ret = self._read()
locals:
ret = <not found>
self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
self._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
line: return util.read_pickled_object(p)
locals:
util = <global> <module 'returnn.util.basic' from '/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py'>
util.read_pickled_object = <global> <function read_pickled_object at 0x7fddcfbc3d30>
p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
locals:
size_raw = <not found>
read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7fddcfbc3ca0>
p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
getvalue = <not found>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
locals:
EOFError = <builtin> <class 'EOFError'>
size = <local> 4
read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
Exception in py_wrap_get_sprint_automata_for_batch:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in SprintSubprocessInstance._start_child
line: ret = self._read()
locals:
ret = <not found>
self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
self._read = <local> <bound method SprintSubprocessInstance._read of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in SprintSubprocessInstance._read
line: return util.read_pickled_object(p)
locals:
util = <global> <module 'returnn.util.basic' from '/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py'>
util.read_pickled_object = <global> <function read_pickled_object at 0x7fddcfbc3d30>
p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
line: size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
locals:
size_raw = <not found>
read_bytes_to_new_buffer = <global> <function read_bytes_to_new_buffer at 0x7fddcfbc3ca0>
p = <local> <_io.FileIO name=34 mode='rb' closefd=True>
getvalue = <not found>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
line: raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
locals:
EOFError = <builtin> <class 'EOFError'>
size = <local> 4
read_size = <local> 0
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in get_sprint_automata_for_batch_op.<locals>.py_wrap_get_sprint_automata_for_batch
line: return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
locals:
py_get_sprint_automata_for_batch = <global> <function py_get_sprint_automata_for_batch at 0x7fdc8c7361f0>
sprint_opts = <local> {'sprintExecPath': '/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-...
tags = <not found>
py_tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
b'switchboard-1/sw02...
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
line: edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
locals:
edges = <not found>
weights = <not found>
start_end_states = <not found>
sprint_instance_pool = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
sprint_instance_pool.get_automata_for_batch = <local> <bound method SprintInstancePool.get_automata_for_batch of <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>>
tags = <local> array([b'switchboard-1/sw02721B/sw2721B-ms98-a-0031',
b'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
b'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
b'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
b'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
b'switchboard-1/sw02...
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in SprintInstancePool.get_automata_for_batch
line: instance = self._get_instance(i)
locals:
instance = <not found>
self = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
self._get_instance = <local> <bound method SprintInstancePool._get_instance of <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>>
i = <local> 0
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in SprintInstancePool._get_instance
line: self._maybe_create_new_instance()
locals:
self = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
self._maybe_create_new_instance = <local> <bound method SprintInstancePool._maybe_create_new_instance of <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in SprintInstancePool._maybe_create_new_instance
line: self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
locals:
self = <local> <returnn.sprint.error_signals.SprintInstancePool object at 0x7fdc85e59f70>
self.instances = <local> []
self.instances.append = <local> <built-in method append of list object at 0x7fdc8489e840>
SprintSubprocessInstance = <global> <class 'returnn.sprint.error_signals.SprintSubprocessInstance'>
self.sprint_opts = <local> {'sprintExecPath': '/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard', 'sprintConfigStr': '--*.configuration.channel=output-channel --*.real-time-factor.channel=output-channel --*.system-info.channel=output-channel --*.time.channel=output-...
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in SprintSubprocessInstance.__init__
line: self.init()
locals:
self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
self.init = <local> <bound method SprintSubprocessInstance.init of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in SprintSubprocessInstance.init
line: self._start_child()
locals:
self = <local> <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>
self._start_child = <local> <bound method SprintSubprocessInstance._start_child of <returnn.sprint.error_signals.SprintSubprocessInstance object at 0x7fdc896c9e50>>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in SprintSubprocessInstance._start_child
line: raise Exception("SprintSubprocessInstance Sprint init failed")
locals:
Exception = <builtin> <class 'Exception'>
Exception: SprintSubprocessInstance Sprint init failed
2023-11-08 17:09:37.114349: W tensorflow/core/framework/op_kernel.cc:1816] UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
2023-11-08 17:09:37.114515: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 14907759204653744683
2023-11-08 17:09:37.114540: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 11924807411687211681
2023-11-08 17:09:37.114558: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 8498381501270362003
2023-11-08 17:09:37.114592: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 328642183433865367
2023-11-08 17:09:37.114608: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15509202514790697743
2023-11-08 17:09:37.114638: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 12478617659299189133
2023-11-08 17:09:37.114656: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 17119912705987515863
2023-11-08 17:09:37.114671: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 1116834209094735605
2023-11-08 17:09:37.114687: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4661036471183676975
2023-11-08 17:09:37.114703: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 17206736268075489981
2023-11-08 17:09:37.114723: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 11940517361119239617
2023-11-08 17:09:37.114737: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 2075000341389533861
2023-11-08 17:09:37.114757: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 1551945598752204051
2023-11-08 17:09:37.114773: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18024994189871473987
2023-11-08 17:09:37.114787: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 8039025426040121703
2023-11-08 17:09:37.114801: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 12780907590735407947
2023-11-08 17:09:37.114832: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18105505433626603299
2023-11-08 17:09:37.114848: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 14023509702728807603
2023-11-08 17:09:37.114861: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4387189208380191869
2023-11-08 17:09:37.114877: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15290859676350821985
2023-11-08 17:09:37.114891: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4708683971917804685
2023-11-08 17:09:37.114905: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 782629118718604739
2023-11-08 17:09:37.114915: I tensorflow/core/framework/local_rendezvous.cc:409] Local rendezvous send item cancelled. Key hash: 16178361428648949333
2023-11-08 17:09:37.114930: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 1368360081948114135
2023-11-08 17:09:37.114956: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9684463615367594434
2023-11-08 17:09:37.114970: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 11191673837626951548
2023-11-08 17:09:37.114986: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4601451330222918362
2023-11-08 17:09:37.115000: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 14060714862683982606
2023-11-08 17:09:37.115032: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 16737683200926961030
2023-11-08 17:09:37.115046: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 17857287931859718032
2023-11-08 17:09:37.115059: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5354699002852183842
2023-11-08 17:09:37.115073: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 12547361387349856700
2023-11-08 17:09:37.115087: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15404591707848971056
2023-11-08 17:09:37.115101: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 7479360675682653368
2023-11-08 17:09:37.115115: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15409731113398965776
2023-11-08 17:09:37.115131: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9679296465648687078
2023-11-08 17:09:37.115145: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9282137006686686836
2023-11-08 17:09:37.115158: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 9017255699680893100
2023-11-08 17:09:37.115172: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 16662337826391890718
2023-11-08 17:09:37.115186: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 6549064369067171100
2023-11-08 17:09:37.115225: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5592458713738762450
2023-11-08 17:09:37.115243: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 6034280818993323922
2023-11-08 17:09:37.115258: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18200915710976925794
2023-11-08 17:09:37.115271: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15218690700986048972
2023-11-08 17:09:37.115284: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 8950560704742236676
2023-11-08 17:09:37.115294: I tensorflow/core/framework/local_rendezvous.cc:409] Local rendezvous send item cancelled. Key hash: 15258328697247900912
2023-11-08 17:09:37.115308: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5450317640836131402
2023-11-08 17:09:37.115328: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 7607626667450182958
2023-11-08 17:09:37.115342: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 18231059680337670234
2023-11-08 17:09:37.115355: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 15520128163238770216
2023-11-08 17:09:37.115365: I tensorflow/core/framework/local_rendezvous.cc:409] Local rendezvous send item cancelled. Key hash: 6445139679874136070
2023-11-08 17:09:37.115379: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 5004971731649411668
2023-11-08 17:09:37.115409: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous recv item cancelled. Key hash: 4347196143763668518
MEMORY: main proc python3(2131331) increased RSS: rss=2.7GB pss=2.7GB uss=2.7GB shared=6.4MB
MEMORY: total (1 procs): pss=2.7GB uss=2.7GB
MEMORY: main proc python3(2131331) increased RSS: rss=2.8GB pss=2.8GB uss=2.8GB shared=6.4MB
MEMORY: total (1 procs): pss=2.8GB uss=2.8GB
MEMORY: main proc python3(2131331) increased RSS: rss=3.0GB pss=3.0GB uss=3.0GB shared=6.4MB
MEMORY: total (1 procs): pss=3.0GB uss=3.0GB
2023-11-08 17:09:51.148252: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:432] Loaded cuDNN version 8600
MEMORY: main proc python3(2131331) increased RSS: rss=3.2GB pss=3.2GB uss=3.2GB shared=6.4MB
MEMORY: total (1 procs): pss=3.2GB uss=3.2GB
MEMORY: main proc python3(2131331) increased RSS: rss=3.4GB pss=3.4GB uss=3.4GB shared=6.4MB
MEMORY: total (1 procs): pss=3.4GB uss=3.4GB
TensorFlow exception: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
Exception UnknownError() in step 0. (pid 2131331)
Failing op: <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
We tried to fetch the op inputs ([<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>]) but got another exception:
target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
EXCEPTION
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1379, in BaseSession._do_call
line: return fn(*args)
locals:
fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.00...
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1362, in BaseSession._do_run.<locals>._run_fn
line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
run_metadata = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1455, in BaseSession._call_tf_sessionrun
line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
fetch_list, target_list,
run_metadata)
locals:
tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7fdd96f6a480>
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7fdc8bcf3770>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
run_metadata = <local> None
UnknownError: 2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict, options=run_options
) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
locals:
fetches_results = <not found>
sess = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options = <not found>
run_options = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 969, in BaseSession.run
line: result = self._run(None, fetches, feed_dict, options_ptr,
run_metadata_ptr)
locals:
result = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options_ptr = <local> None
run_metadata_ptr = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1192, in BaseSession._run
line: results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
locals:
results = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
handle = <local> None
final_targets = <local> [<tf.Operation 'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1' type=Merge>, <tf.Operation 'optim_and_step_incr' type=NoOp>]
final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049...
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1372, in BaseSession._do_run
line: return self._do_call(_run_fn, feeds, fetches, targets, options,
run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
_run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1398, in BaseSession._do_call
line: raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
locals:
type = <builtin> <class 'type'>
e = <not found>
node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
op: "PyFunc"
input: "extern_data/placeholders/seq_tag/seq_tag"
attr {
key: "token"
value {
s: "pyfunc_0"
}
}
attr {
key: "Tout"
value {
list {
type: DT_INT32
type: DT_FLOAT
type: DT_INT...
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "./returnn/rnn.py", line 11, in <module>\n main()\n File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai..., len = 12234
UnknownError: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4341, in help_on_tf_exception
line: debug_fetch, fetch_helpers, op_copied = FetchHelper.copy_graph(
debug_fetch,
target_op=op,
fetch_helper_tensors=list(op.inputs),
stop_at_ts=stop_at_ts,
verbose_stream=file,
)
locals:
debug_fetch = <local> <tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>
fetch_helpers = <not found>
op_copied = <not found>
FetchHelper = <local> <class 'returnn.tf.util.basic.FetchHelper'>
FetchHelper.copy_graph = <local> <bound method FetchHelper.copy_graph of <class 'returnn.tf.util.basic.FetchHelper'>>
target_op = <not found>
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
fetch_helper_tensors = <not found>
list = <builtin> <class 'list'>
op.inputs = <local> (<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>,)
stop_at_ts = <local> [<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>, <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, <tf.Tensor 'extern_data/placeholders/batch_dim:...
verbose_stream = <not found>
file = <local> <returnn.log.Stream object at 0x7fddcfa8ee50>
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/util/basic.py", line 7700, in FetchHelper.copy_graph
line: assert target_op in ops, "target_op %r,\nops\n%s" % (target_op, pformat(ops))
locals:
target_op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
ops = <local> [<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
pformat = <local> <function pformat at 0x7fddd3ddcc10>
AssertionError: target_op <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>,
ops
[<tf.Operation 'extern_data/placeholders/seq_tag/seq_tag' type=Placeholder>]
Step meta information:
{'seq_idx': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38],
'seq_tag': ['switchboard-1/sw02721B/sw2721B-ms98-a-0031',
'switchboard-1/sw02427A/sw2427A-ms98-a-0021',
'switchboard-1/sw02848B/sw2848B-ms98-a-0086',
'switchboard-1/sw04037A/sw4037A-ms98-a-0027',
'switchboard-1/sw02370B/sw2370B-ms98-a-0117',
'switchboard-1/sw02145A/sw2145A-ms98-a-0107',
'switchboard-1/sw02484A/sw2484A-ms98-a-0077',
'switchboard-1/sw02768A/sw2768A-ms98-a-0064',
'switchboard-1/sw03312B/sw3312B-ms98-a-0041',
'switchboard-1/sw02344B/sw2344B-ms98-a-0023',
'switchboard-1/sw04248B/sw4248B-ms98-a-0017',
'switchboard-1/sw02762A/sw2762A-ms98-a-0059',
'switchboard-1/sw03146A/sw3146A-ms98-a-0047',
'switchboard-1/sw03032A/sw3032A-ms98-a-0065',
'switchboard-1/sw02288A/sw2288A-ms98-a-0080',
'switchboard-1/sw02751A/sw2751A-ms98-a-0066',
'switchboard-1/sw02369A/sw2369A-ms98-a-0118',
'switchboard-1/sw04169A/sw4169A-ms98-a-0059',
'switchboard-1/sw02227A/sw2227A-ms98-a-0016',
'switchboard-1/sw02061B/sw2061B-ms98-a-0170',
'switchboard-1/sw02862B/sw2862B-ms98-a-0033',
'switchboard-1/sw03116B/sw3116B-ms98-a-0065',
'switchboard-1/sw03517B/sw3517B-ms98-a-0038',
'switchboard-1/sw02360B/sw2360B-ms98-a-0086',
'switchboard-1/sw02510B/sw2510B-ms98-a-0061',
'switchboard-1/sw03919A/sw3919A-ms98-a-0017',
'switchboard-1/sw02965A/sw2965A-ms98-a-0045',
'switchboard-1/sw03154A/sw3154A-ms98-a-0073',
'switchboard-1/sw02299A/sw2299A-ms98-a-0005',
'switchboard-1/sw04572A/sw4572A-ms98-a-0026',
'switchboard-1/sw02682A/sw2682A-ms98-a-0022',
'switchboard-1/sw02808A/sw2808A-ms98-a-0014',
'switchboard-1/sw04526A/sw4526A-ms98-a-0026',
'switchboard-1/sw03180B/sw3180B-ms98-a-0010',
'switchboard-1/sw03227A/sw3227A-ms98-a-0029',
'switchboard-1/sw03891B/sw3891B-ms98-a-0008',
'switchboard-1/sw03882B/sw3882B-ms98-a-0041',
'switchboard-1/sw03102B/sw3102B-ms98-a-0027',
'switchboard-1/sw02454A/sw2454A-ms98-a-0029']}
Feed dict:
<tf.Tensor 'extern_data/placeholders/batch_dim:0' shape=() dtype=int32>: int(39)
<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: shape (39, 10208, 1), dtype float32, min/max -1.0/1.0, mean/stddev 0.0014351769/0.11459725, Tensor{'data', [B,T|'time:var:extern_data:data'[B],F|F'feature:data'(1)]}
<tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>: shape (39,), dtype int32, min/max 4760/10208, ([ 4760 6246 6372 6861 7296 7499 7534 7622 7824 8031 8295 8431
8690 8675 8667 8886 9084 9199 9163 9156 9274 9262 9540 9668
9678 9719 9711 9902 9989 10010 10020 10073 10006 10102 10131 10112
10130 10178 10208])
<tf.Tensor 'extern_data/placeholders/seq_tag/seq_tag:0' shape=(?,) dtype=string>: type <class 'list'>, Tensor{'seq_tag', [B?], dtype='string'}
<tf.Tensor 'globals/train_flag:0' shape=() dtype=bool>: bool(True)
EXCEPTION
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1379, in BaseSession._do_call
line: return fn(*args)
locals:
fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
args = <local> ({<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.00...
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1362, in BaseSession._do_run.<locals>._run_fn
line: return self._call_tf_sessionrun(options, feed_dict, fetch_list,
target_list, run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._call_tf_sessionrun = <local> <bound method BaseSession._call_tf_sessionrun of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
run_metadata = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1455, in BaseSession._call_tf_sessionrun
line: return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
fetch_list, target_list,
run_metadata)
locals:
tf_session = <global> <module 'tensorflow.python.client.pywrap_tf_session' from '/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/pywrap_tf_session.py'>
tf_session.TF_SessionRun_wrapper = <global> <built-in method TF_SessionRun_wrapper of PyCapsule object at 0x7fdd96f6a480>
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._session = <local> <tensorflow.python.client._pywrap_tf_session.TF_Session object at 0x7fdc8bcf3770>
options = <local> None
feed_dict = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetch_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
target_list = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
run_metadata = <local> None
UnknownError: 2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
EXCEPTION
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 744, in Runner.run
line: fetches_results = sess.run(
fetches_dict, feed_dict=feed_dict, options=run_options
) # type: typing.Dict[str,typing.Union[numpy.ndarray,str]]
locals:
fetches_results = <not found>
sess = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
sess.run = <local> <bound method BaseSession.run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
fetches_dict = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options = <not found>
run_options = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 969, in BaseSession.run
line: result = self._run(None, fetches, feed_dict, options_ptr,
run_metadata_ptr)
locals:
result = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._run = <local> <bound method BaseSession._run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
fetches = <local> {'size:data:0': <tf.Tensor 'extern_data/placeholders/data/data_dim0_size:0' shape=(?,) dtype=int32>, 'loss': <tf.Tensor 'objective/add:0' shape=() dtype=float32>, 'cost:output': <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, 'loss_norm_..., len = 8
feed_dict = <local> {<tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
options_ptr = <local> None
run_metadata_ptr = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1192, in BaseSession._run
line: results = self._do_run(handle, final_targets, final_fetches,
feed_dict_tensor, options, run_metadata)
locals:
results = <not found>
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._do_run = <local> <bound method BaseSession._do_run of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
handle = <local> None
final_targets = <local> [<tf.Operation 'conformer_1_conv_mod_bn/batch_norm/cond/Merge_1' type=Merge>, <tf.Operation 'optim_and_step_incr' type=NoOp>]
final_fetches = <local> [<tf.Tensor 'objective/add:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss/FastBaumWelchLoss/generic_loss_and_error_signal:0' shape=() dtype=float32>, <tf.Tensor 'objective/loss/loss_init/truediv:0' shape=() dtype=float32>, <tf.Tensor 'globals/mem_usage_deviceGPU0:0' shape=() dtype=in...
feed_dict_tensor = <local> {<Reference wrapping <tf.Tensor 'extern_data/placeholders/data/data:0' shape=(?, ?, 1) dtype=float32>>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049...
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1372, in BaseSession._do_run
line: return self._do_call(_run_fn, feeds, fetches, targets, options,
run_metadata)
locals:
self = <local> <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>
self._do_call = <local> <bound method BaseSession._do_call of <tensorflow.python.client.session.Session object at 0x7fddcf9b4d60>>
_run_fn = <local> <function BaseSession._do_run.<locals>._run_fn at 0x7fdc85e97b80>
feeds = <local> {<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8b849530>: array([[[-0.05505638],
[-0.09610788],
[-0.05115783],
...,
[ 0. ],
[ 0. ],
[ 0. ]],
[[-0.00226238],
[-0.01049833],
[-0.001...
fetches = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc879fa270>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc87a00bb0>, <tensorflow.python.client._pywrap_tf_session.TF_Output object at 0x7fdc8c73f530>, <tensorflow.python.client._pywrap_tf_session.TF_Ou...
targets = <local> [<tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b370>, <tensorflow.python.client._pywrap_tf_session.TF_Operation object at 0x7fd9c570b3b0>]
options = <local> None
run_metadata = <local> None
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/client/session.py", line 1398, in BaseSession._do_call
line: raise type(e)(node_def, op, message) # pylint: disable=no-value-for-parameter
locals:
type = <builtin> <class 'type'>
e = <not found>
node_def = <local> name: "objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch"
op: "PyFunc"
input: "extern_data/placeholders/seq_tag/seq_tag"
attr {
key: "token"
value {
s: "pyfunc_0"
}
}
attr {
key: "Tout"
value {
list {
type: DT_INT32
type: DT_FLOAT
type: DT_INT...
op = <local> <tf.Operation 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' type=PyFunc>
message = <local> 'Graph execution error:\n\nDetected at node \'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch\' defined at (most recent call last):\n File "./returnn/rnn.py", line 11, in <module>\n main()\n File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__mai..., len = 12234
UnknownError: Graph execution error:
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
Detected at node 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch' defined at (most recent call last):
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
Node: 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch'
2 root error(s) found.
(0) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
[[objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch/_661]]
(1) UNKNOWN: Exception: SprintSubprocessInstance Sprint init failed
Traceback (most recent call last):
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 165, in _start_child
ret = self._read()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 226, in _read
return util.read_pickled_object(p)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2629, in read_pickled_object
size_raw = read_bytes_to_new_buffer(p, 4).getvalue()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/util/basic.py", line 2612, in read_bytes_to_new_buffer
raise EOFError("expected to read %i bytes but got EOF after %i bytes" % (size, read_size))
EOFError: expected to read 4 bytes but got EOF after 0 bytes
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 268, in __call__
ret = func(*args)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/autograph/impl/api.py", line 643, in wrapper
return func(*args, **kwargs)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 45, in py_wrap_get_sprint_automata_for_batch
return py_get_sprint_automata_for_batch(sprint_opts=sprint_opts, tags=py_tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 20, in py_get_sprint_automata_for_batch
edges, weights, start_end_states = sprint_instance_pool.get_automata_for_batch(tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 512, in get_automata_for_batch
instance = self._get_instance(i)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 418, in _get_instance
self._maybe_create_new_instance()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 406, in _maybe_create_new_instance
self.instances.append(SprintSubprocessInstance(**self.sprint_opts))
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 81, in __init__
self.init()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 303, in init
self._start_child()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/sprint/error_signals.py", line 170, in _start_child
raise Exception("SprintSubprocessInstance Sprint init failed")
Exception: SprintSubprocessInstance Sprint init failed
[[{{node objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch}}]]
0 successful operations.
0 derived errors ignored.
Original stack trace for 'objective/loss/loss/FastBaumWelchLoss/get_sprint_automata_for_batch':
File "./returnn/rnn.py", line 11, in <module>
main()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 634, in main
execute_main_task()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/__main__.py", line 439, in execute_main_task
engine.init_train_from_config(config, train_data, dev_data, eval_data)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1149, in init_train_from_config
self.init_network_from_config(config)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1234, in init_network_from_config
self._init_network(net_desc=net_dict, epoch=self.epoch)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1429, in _init_network
self.network, self.updater = self.create_network(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/engine.py", line 1491, in create_network
updater = Updater(config=config, network=network, initial_learning_rate=initial_learning_rate)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/updater.py", line 172, in __init__
self.loss = network.get_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1552, in get_objective
self.maybe_construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1545, in maybe_construct_objective
self._construct_objective()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1529, in _construct_objective
losses_dict, total_loss, total_constraints = self.get_losses_initialized(with_total=True)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 1499, in get_losses_initialized
if loss_obj.get_loss_value_for_objective() is not None:
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 3957, in get_loss_value_for_objective
self._prepare()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/network.py", line 4080, in _prepare
self._loss_value = self.loss.get_value()
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/layers/basic.py", line 13165, in get_value
fwdbwd, obs_scores = fast_baum_welch_by_sprint_automata(
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/native_op.py", line 1420, in fast_baum_welch_by_sprint_automata
edges, weights, start_end_states = get_sprint_automata_for_batch_op(sprint_opts=sprint_opts, tags=tags)
File "/work/asr4/vieting/tmp/20231108_tf213_sprint_op/returnn/returnn/tf/sprint.py", line 54, in get_sprint_automata_for_batch_op
edges, weights, start_end_states = tf_compat.v1.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/deprecation.py", line 371, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/traceback_utils.py", line 150, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 1176, in op_dispatch_handler
return dispatch_target(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 678, in py_func
return py_func_common(func, inp, Tout, stateful, name=name)
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 653, in py_func_common
return _internal_py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/script_ops.py", line 378, in _internal_py_func
result = gen_script_ops.py_func(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 149, in py_func
_, _, _op, _outputs = _op_def_library._apply_op_helper(
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/op_def_library.py", line 795, in _apply_op_helper
op = g._create_op_internal(op_type_name, inputs, dtypes=None,
File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/ops.py", line 3381, in _create_op_internal
ret = Operation.from_node_def(
Save model under output/models/epoch.001.crash_0
Trainer not finalized, quitting. (pid 2131331)
There it seems that RASR does not start at all. I see:
/work/asr4/vieting/programs/rasr/20230707/rasr/arch/linux-x86_64-standard/nn-trainer.linux-x86_64-standard: error while loading shared libraries: libtensorflow_cc.so.2: cannot open shared object file: No such file or directory
Btw, the RASR segmentation fault looks actually like a bug in RASR. RASR should never segfault.
Most of rasr problems result in segmentation fault. Sometimes you get more info, sometimes it's only about a not consistent compilation.
On Wed, Nov 8, 2023, 17:43 Albert Zeyer @.***> wrote:
Btw, the RASR segmentation fault looks actually like a bug in RASR. RASR should never segfault.
— Reply to this email directly, view it on GitHub https://github.com/rwth-i6/returnn/issues/1456#issuecomment-1802268733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEQ6G5P4AKNQGS6QKNZFEMTYDOZD3AVCNFSM6AAAAAA7CWIXA2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBSGI3DQNZTGM . You are receiving this because you commented.Message ID: @.***>
Whenever RASR gives a segfault, that's a bug in RASR. It should never segfault. Can you link corresponding RASR issues here? Or if this is not reported yet, can you open a corresponding RASR issue?