Albert Zeyer

Results 1028 comments of Albert Zeyer

Why do you think this error is in KenLM? The error occurs in `NeMo/eval_beamsearch_ngram_ctc.py`. So this sounds like it is in NeMo? Also `EncDecCTCModelBPE` is not from KenLM.

Via `sacctmgr show User $(whoami) -s`, you can query that information on the limits. This gives: ``` User Def Acct Admin Cluster Account Partition Share Priority MaxJobs MaxNodes MaxCPUs MaxSubmit...

> What about just stop submitting jobs when we get a `sbatch: error: AssocMaxSubmitJobLimit`, wait for a certain amount of time and try again? Yes, this is also what I...

I think actually that this was maybe a hiccup also in the cluster. Some FS might have not been available temporarily. That would explain this: ``` OSError: [Errno 107] Transport...

Note, that second issue (`ValueError: Dividing a Tensor of type int by an integer is disallowed`) is somewhat unrelated, and was already reported: #1749 I just reported it here again...

What I was thinking is still a possible race condition (similar to #1785): In `_copy_file_if_needed`, it calls: ```python # Make sure we have enough disk space, st_size +1 due to...

Another instance of this error: https://github.com/rwth-i6/returnn/actions/runs/18658137586/job/53192158861

Some side remark: I also wonder why I get this `GLIBCXX_3.4.30` error now. I think I got this before, but somehow resolved it (I forgot though what I did...), but...

In my current interactive session, I can reproduce the compile error when running the tests. In this session, `g++` points to `/cvmfs/software.hpc.rwth.de/Linux/RH9/x86_64/intel/sapphirerapids/software/GCCcore/13.3.0/bin/g++`. ``` Traceback (most recent call last): File "/rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/tests/test_rf_base.py",...

Weird. So I did `module load GCCcore/13.3.0`. But actually `g++` already pointing to GCC 13.3.0 before (see above). `module load GCCcore/13.3.0` produced this output: ``` [INFO] Module zlib/1.3.1 loaded. [INFO]...