test_rel_pos_self_attention (nondeterministic? rare?) exception: dim_orig is None

Open albertz opened this issue 8 months ago • 0 comments
I'm seeing this now the first time (here). It's maybe/probably non-determinstic and rare? (Not sure if some hiccup could cause this.) Anyway, reporting here now.
File "/home/runner/.local/lib/python3.10/site-packages/_pytest/python.py", line 159, in pytest_pyfunc_call
    line: result = testfunction(**testargs)
    locals:
      result = <not found>
      testfunction = <local> <function test_rel_pos_self_attention at 0x7f9aba50b0a0>
      testargs = <local> {}
  File "/home/runner/work/returnn/returnn/tests/test_rf_attention.py", line 633, in test_rel_pos_self_attention
    line: run_model(extern_data, lambda *, epoch, step: _Net(), _forward_step, test_tensorflow=False)
    locals:
      run_model = <global> <function run_model at 0x7f9abd5cae60>
      extern_data = <local> TensorDict({'data': Tensor{'data', [B,'time'[B],'in'(8)]}})
      epoch = <not found>
      step = <not found>
      _Net = <local> <class 'test_rf_attention.test_rel_pos_self_attention.<locals>._Net'>
      _forward_step = <local> <function test_rel_pos_self_attention.<locals>._forward_step at 0x7f9a800a2ef0>
      test_tensorflow = <not found>
  File "/home/runner/work/returnn/returnn/tests/rf_utils.py", line 103, in run_model
    line: _run_model_torch_single_batch(extern_data, get_model, forward_step, batch_idx=batch_idx, ref_output=out_pt)
    locals:
      _run_model_torch_single_batch = <global> <function _run_model_torch_single_batch at 0x7f9abd4188b0>
      extern_data = <local> TensorDict({'data': Tensor{'data', [B,'time'[B],'in'(8)]}})
      get_model = <local> <function test_rel_pos_self_attention.<locals>.<lambda> at 0x7f9a6f66b250>
      forward_step = <local> <function test_rel_pos_self_attention.<locals>._forward_step at 0x7f9a800a2ef0>
      batch_idx = <local> 4
      ref_output = <not found>
      out_pt = <local> TensorDict({'output': Tensor{'output', [B,'time'[B],F|'out'(5)]}})
  File "/home/runner/work/returnn/returnn/tests/rf_utils.py", line 293, in _run_model_torch_single_batch
    line: output = _run_model_torch(extern_data, get_model, forward_step)
    locals:
      output = <not found>
      _run_model_torch = <global> <function _run_model_torch at 0x7f9abd4[185](https://github.com/rwth-i6/returnn/actions/runs/14908412440/job/41876349116#step:7:186)e0>
      extern_data = <local> TensorDict({'data': Tensor{'data', [B,'time'[B],'in'(8)]}})
      get_model = <local> <function test_rel_pos_self_attention.<locals>.<lambda> at 0x7f9a6f66b250>
      forward_step = <local> <function test_rel_pos_self_attention.<locals>._forward_step at 0x7f9a800a2ef0>
  File "/home/runner/work/returnn/returnn/tests/rf_utils.py", line 170, in _run_model_torch
    line: forward_step(model=model, extern_data=extern_data)
    locals:
      forward_step = <local> <function test_rel_pos_self_attention.<locals>._forward_step at 0x7f9a800a2ef0>
      model = <local> <_Net>
      extern_data = <local> TensorDict({'data': Tensor{'data', [B,'time'[B],'in'(8)]}})
  File "/home/runner/work/returnn/returnn/tests/test_rf_attention.py", line 628, in test_rel_pos_self_attention.<locals>._forward_step
    line: out = model(extern_data["data"], axis=time_dim)
    locals:
      out = <not found>
      model = <local> <_Net>
      extern_data = <local> TensorDict({'data': Tensor{'data', [B,'time'[B],'in'(8)]}})
      axis = <not found>
      time_dim = <local> Dim{'time'[B]}
  File "/home/runner/work/returnn/returnn/tests/test_rf_attention.py", line 613, in test_rel_pos_self_attention.<locals>._Net.__call__
    line: y_b = self.self_att(x_b, axis=axis_b)
    locals:
      y_b = <not found>
      self = <local> <_Net>
      self.self_att = <local> <RelPosSelfAttention>
      x_b = <local> Tensor{'replace_dim', ['gather'[],'in'(8)]}
      axis = <local> Dim{'time'[B]}
      axis_b = <local> Dim{'gather'[]}
  File "/home/runner/work/returnn/returnn/returnn/frontend/attention.py", line 456, in RelPosSelfAttention.__call__
    line: pos_emb, pos_emb_spatial_dim = relative_positional_encoding(
              query_spatial_dim=axis, key_value_spatial_dim=axis, feat_dim=self.pos_emb_feat_dim
          )
    locals:
      pos_emb = <not found>
      pos_emb_spatial_dim = <not found>
      relative_positional_encoding = <global> <function relative_positional_encoding at 0x7f9ba3afcc10>
      query_spatial_dim = <not found>
      axis = <local> Dim{'gather'[]}
      key_value_spatial_dim = <not found>
      feat_dim = <not found>
      self = <local> <RelPosSelfAttention>
      self.pos_emb_feat_dim = <local> Dim{'in'(8)}
  File "/home/runner/work/returnn/returnn/returnn/frontend/attention.py", line 928, in relative_positional_encoding
    line: cache_entry = _relative_positional_encoding_cache.get(cache_key)
    locals:
      cache_entry = <not found>
      _relative_positional_encoding_cache = <global> <returnn.frontend._cache.Cache CacheInfo(hits=7, misses=36, maxsize=128, currsize=25)>
      _relative_positional_encoding_cache.get = <global> <bound method Cache.get of <returnn.frontend._cache.Cache CacheInfo(hits=7, misses=36, maxsize=128, currsize=25)>>
      cache_key = <local> (Dim{'gather'[]}, Dim{'gather'[]}, Dim{'in'(8)}, 0, 'float32')
  File "/home/runner/work/returnn/returnn/returnn/frontend/_cache.py", line 63, in Cache.get
    line: assert isinstance(dim_orig, Dim) and isinstance(dim, Dim)
    locals:
      isinstance = <builtin> <built-in function isinstance>
      dim_orig = <local> None
      Dim = <global> <class 'returnn.tensor.dim.Dim'>
      dim = <local> Dim{'gather'[]}
AssertionError
May 08 '25 16:05 albertz