returnn Convergence issue with segmental length model

trafficstars

When using the config in: https://gist.github.com/robin-p-schmitt/41da5e1274ccb93be22881f2f1fe91ba, I have the problem that the model does not converge at all. When looking at the learning rate file:

1: EpochData(learningRate=0.0001, error={
'dev_error_ctc': 0.9583588315669095,
'dev_error_label_model/label_prob': 0.6887621695767638,
'dev_error_label_model/length_model': 0.9999999998499519,
'dev_score_ctc': 0.0,
'dev_score_label_model/label_prob': 46.05170047498763,
'dev_score_label_model/length_model': float('nan'),
'devtrain_error_ctc': 0.9618647864437795,
'devtrain_error_label_model/label_prob': 0.6914862408234509,
'devtrain_error_label_model/length_model': 1.0000000012823582,
'devtrain_score_ctc': 0.0,
'devtrain_score_label_model/label_prob': 46.051700532973975,
'devtrain_score_label_model/length_model': float('nan'),
'train_error_ctc': 0.9321049111097007,
'train_error_label_model/label_prob': 0.7323609187074502,
'train_error_label_model/length_model': 0.9986981431224986,
'train_score_ctc': 0.07518191292189359,
'train_score_label_model/label_prob': 45.89842947947897,
'train_score_label_model/length_model': float('nan'),
}),
2: EpochData(learningRate=0.00019999999999999998, error={
'dev_error_ctc': 0.9583588315669095,
'dev_error_label_model/label_prob': 0.6887621695767638,
'dev_error_label_model/length_model': 0.9999999998499519,
'dev_score_ctc': 0.0,
'dev_score_label_model/label_prob': 46.05170047498763,
'dev_score_label_model/length_model': float('nan'),
'devtrain_error_ctc': 0.9618647864437795,
'devtrain_error_label_model/label_prob': 0.6914862408234509,
'devtrain_error_label_model/length_model': 1.0000000012823582,
'devtrain_score_ctc': 0.0,
'devtrain_score_label_model/label_prob': 46.051700532973975,
'devtrain_score_label_model/length_model': float('nan'),
'train_error_ctc': 0.9308947519972764,
'train_error_label_model/label_prob': 0.7300633606144286,
'train_error_label_model/length_model': 1.000000004447767,
'train_score_ctc': 0.0,
'train_score_label_model/label_prob': 46.05170046216968,
'train_score_label_model/length_model': float('nan'),
}),
3: EpochData(learningRate=0.0003, error={
'dev_error_ctc': 0.9583588315669095,
'dev_error_label_model/label_prob': 0.6887621695767638,
'dev_error_label_model/length_model': 0.9999999998499519,
'dev_score_ctc': 0.0,
'dev_score_label_model/label_prob': 46.05170047498763,
'dev_score_label_model/length_model': float('nan'),
'devtrain_error_ctc': 0.9618647864437795,
'devtrain_error_label_model/label_prob': 0.6914862408234509,
'devtrain_error_label_model/length_model': 1.0000000012823582,
'devtrain_score_ctc': 0.0,
'devtrain_score_label_model/label_prob': 46.051700532973975,
'devtrain_score_label_model/length_model': float('nan'),
'train_error_ctc': 0.9309530333980968,
'train_error_label_model/label_prob': 0.7295836710735882,
'train_error_label_model/length_model': 1.0000000031188645,
'train_score_ctc': 0.0,
'train_score_label_model/label_prob': 46.051700434792,
'train_score_label_model/length_model': float('nan'),
}),

the scores and errors do not change at all between different epochs and also the scores of the length_model are NAN. I already looked at the targets and output of the length_model layer and they look correct (the targets are single numbers and the output is a normalized vector with 20 values). However, the problem seems to be caused by the length_model because the model is converging fine when this layer is not present. I think the relevant layers are:

    "label_model": {
        "back_prop": True,
        "class": "rec",
        "from": "data:label_ground_truth",
        "include_eos": True,
        "is_output_layer": True,
        "name_scope": "output/rec",
        "unit": {
            "length_model": {
                "activation": "softmax",
                "class": "linear",
                "from": "length_model0",
                "is_output_layer": True,
                "loss": "ce",
                "target": "segment_lens_target",
            },
            "length_model0": {
                "L2": 0.0001,
                "class": "rec",
                "dropout": 0.3,
                "from": ["non_blank_embed_128", "pooled_segment"],
                "n_out": 128,
                "unit": "nativelstm2",
                "unit_opts": {"rec_weight_dropout": 0.3},
            },
            "non_blank_embed_128": {
                "activation": None,
                "class": "linear",
                "from": "output",
                "n_out": 128,
                "with_bias": False,
            },
            "pool_segments": {"class": "copy", "from": "segments"},
            "pooled_segment": {
                "axes": ["stag:att_t"],
                "class": "reduce",
                "from": "pool_segments",
                "mode": "mean",
            },
            "segment_lens": {
                "axis": "t",
                "class": "gather",
                "from": "base:data:segment_lens_masked",
                "position": ":i",
            },
            "segment_starts": {
                "axis": "t",
                "class": "gather",
                "from": "base:data:segment_starts_masked",
                "position": ":i",
            },
            "segments": {
                "class": "reinterpret_data",
                "from": "segments0",
                "set_dim_tags": {
                    "stag:sliced-time:segments": Dim(
                        kind=Dim.Types.Spatial, description="att_t"
                    )
                },
            },
            "segments0": {
                "class": "slice_nd",
                "from": "base:encoder",
                "size": "segment_lens",
                "start": "segment_starts",
            },
        },
    },
    "output": {
        "back_prop": True,
        "class": "rec",
        "from": "encoder",
        "include_eos": True,
        "size_target": "targetb",
        "target": "targetb",
        "unit": {
            "const1": {"class": "constant", "value": 1},
            "output": {
                "beam_size": 4,
                "cheating": "exclusive",
                "class": "choice",
                "from": "data",
                "initial_output": 1030,
                "input_type": "log_prob",
                "target": "targetb",
            },
            "output_emit": {
                "class": "compare",
                "from": "output",
                "initial_output": True,
                "kind": "not_equal",
                "value": 1031,
            },
            "segment_lens": {
                "class": "combine",
                "from": ["segment_lens0", "const1"],
                "is_output_layer": True,
                "kind": "add",
            },
            "segment_lens0": {
                "class": "combine",
                "from": [":i", "segment_starts"],
                "kind": "sub",
            },
            "segment_starts": {
                "class": "switch",
                "condition": "prev:output_emit",
                "false_from": "prev:segment_starts",
                "initial_output": 0,
                "is_output_layer": True,
                "true_from": ":i",
            },
        },
    },
    "segment_lens_masked": {
        "class": "masked_computation",
        "from": "output/segment_lens",
        "mask": "is_label",
        "out_spatial_dim": Dim(kind=Dim.Types.Spatial, description="label-axis"),
        "register_as_extern_data": "segment_lens_masked",
        "unit": {"class": "copy", "from": "data"},
    },
    "segment_lens_sparse": {
        "class": "reinterpret_data",
        "from": "segment_lens_masked",
        "register_as_extern_data": "segment_lens_target",
        "set_sparse": True,
        "set_sparse_dim": 20,
    },

May 16 '22 10:05 robin-p-schmitt

Do you expect some RETURNN bug here? Usually such nan/inf issues, or convergence issues are user errors.

I see you have "set_sparse_dim": 20 for segment_lens_target.

And then:

"length_model": {
    "activation": "softmax",
    "class": "linear",
    "from": "length_model0",
    "is_output_layer": True,
    "loss": "ce",
    "target": "segment_lens_target",
},

You maybe should dump the actual targets. If those are outside that range, this could lead to nan.

Maybe on RETURNN side, we could add some flag like debug_extra_checks or so which then could enable an extra check here for valid indices.

May 16 '22 10:05 albertz

If those are outside that range, this could lead to nan

Oh okay, I thought this would lead to an error in RETURNN. I will check the targets and will get back here once I know.

May 16 '22 10:05 robin-p-schmitt

@robin-p-schmitt Did you ever check this? What was the result?

Or is this not relevant anymore for you? Then let's close this issue.

Sep 23 '22 09:09 albertz

returnn returnn copied to clipboard

Convergence issue with segmental length model

returnn
returnn copied to clipboard