Tina Raissi
Tina Raissi
I would like just to share one use case with python 3.8 and full-sum training, our HiWi Daniel had some additional scientific packages installed, and even though returnn run script...
> In what way is this specific to Python 3.8 Since no further investigation was done, I cannot express any further thoughts on this. However, your high confidence in underlying...
> some mess-up in the system or environment In which context do you think we should follow-up with more investigation on this? I think this is an important issue. Implementation...
> @Marvin84 You as well? No, I did not experience this during my training. I do single gpu no conv. layer. However, I was interested to know whether this was...
We encountered this bug and there is a patch for it. Daniel wanted to do a PR. On Wed, Nov 8, 2023, 12:25 vieting ***@***.***> wrote: > Sure, the full...
``` 1 diff --git a/returnn/sprint/error_signals.py b/returnn/sprint/error_signals.py 2 index 735ac363..1c204e68 100644 3 --- a/returnn/sprint/error_signals.py 4 +++ b/returnn/sprint/error_signals.py 5 @@ -130,7 +130,7 @@ class SprintSubprocessInstance: 6 7 def _start_child(self): 8 assert self.child_pid...
AFAIR, the problem occurs when running in apptainer environment only. The buffer does not contain all info and returnn crashes because of rasr automata being truncated/ not complete
> Can you link the full patch? It seems incomplete here. Sure, just edited the comment.
@christophmluscher @NeoLegends does this relate to the rasr compiled with TF 2.13? Do you recognize this error?
Most of rasr problems result in segmentation fault. Sometimes you get more info, sometimes it's only about a not consistent compilation. On Wed, Nov 8, 2023, 17:43 Albert Zeyer ***@***.***>...