flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

t2492-shell-lost.sh: job gets SIGINT too early

Open grondo opened this issue 11 months ago • 0 comments

Saw this one in CI. It appears that the SIGINT is being sent to the job while one task is still importing modules. I'm not sure how that happens since there's a barrier call in there, so something unexpected must be happening here.

2024-03-15T21:46:18.6648215Z t2492: Sending SIGINT to fgVXsPm. Job should now exit
2024-03-15T21:46:18.6648712Z 0.000s: job.submit {"userid":1001,"urgency":16,"flags":0,"version":1}
2024-03-15T21:46:18.6649136Z 0.012s: job.validate
2024-03-15T21:46:18.6649385Z 0.025s: job.depend
2024-03-15T21:46:18.6649638Z 0.025s: job.priority {"priority":16}
2024-03-15T21:46:18.6650190Z 0.031s: job.alloc {"annotations":{"sched":{"resource_summary":"rank[0-3]/core0"}}}
2024-03-15T21:46:18.6650663Z 0.061s: job.start
2024-03-15T21:46:18.6650985Z flux-job: task(s) exited with exit code 130
2024-03-15T21:46:18.6651334Z 0.482s: job.finish {"status":33280}
2024-03-15T21:46:18.6651630Z 0.032s: exec.init
2024-03-15T21:46:18.6651864Z 0.034s: exec.starting
2024-03-15T21:46:18.6652353Z 0.157s: exec.shell.init {"service":"1001-shell-fgVXsPm","leader-rank":0,"size":4}
2024-03-15T21:46:18.6652962Z 0.176s: exec.shell.start {"taskmap":{"version":1,"map":[[0,4,1,1]]}}
2024-03-15T21:46:18.6653808Z 0.474s: exec.shell.task-exit {"localid":0,"rank":1,"state":"Exited","pid":242340,"wait_status":2,"signaled":2,"exitcode":130}
2024-03-15T21:46:18.6654454Z 0.482s: exec.complete {"status":33280}
2024-03-15T21:46:18.6654766Z 0.482s: exec.done
2024-03-15T21:46:18.6655372Z 0.395s: flux-shell[0]:  WARN: exception: exception.c:49: shell rank 3 (on fv-az1492-456): Killed
2024-03-15T21:46:18.6655932Z Traceback (most recent call last):
2024-03-15T21:46:18.6656472Z   File "/tmp/flux-TKzLPm/jobtmp-0-fgVXsPm/critical.py", line 4, in <module>
2024-03-15T21:46:18.6656939Z     import flux
2024-03-15T21:46:18.6657343Z   File "/usr/src/src/bindings/python/flux/__init__.py", line 14, in <module>
2024-03-15T21:46:18.6657835Z     import flux.core.handle
2024-03-15T21:46:18.6658312Z   File "/usr/src/src/bindings/python/flux/core/handle.py", line 19, in <module>
2024-03-15T21:46:18.6658834Z     from flux.future import Future
2024-03-15T21:46:18.6659301Z   File "/usr/src/src/bindings/python/flux/future.py", line 16, in <module>
2024-03-15T21:46:18.6659865Z     from flux.util import check_future_error, interruptible
2024-03-15T21:46:18.6660418Z   File "/usr/src/src/bindings/python/flux/util.py", line 43, in <module>
2024-03-15T21:46:18.6660946Z     from flux.utils.parsedatetime import Calendar
2024-03-15T21:46:18.6661587Z   File "/usr/src/src/bindings/python/flux/utils/parsedatetime/__init__.py", line 69, in <module>
2024-03-15T21:46:18.6662280Z     pdtLocales = dict([(x, load_locale(x)) for x in _locales])
2024-03-15T21:46:18.6662722Z                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-03-15T21:46:18.6663367Z   File "/usr/src/src/bindings/python/flux/utils/parsedatetime/__init__.py", line 69, in <listcomp>
2024-03-15T21:46:18.6664058Z     pdtLocales = dict([(x, load_locale(x)) for x in _locales])
2024-03-15T21:46:18.6664502Z                            ^^^^^^^^^^^^^^
2024-03-15T21:46:18.6665874Z   File "/usr/src/src/bindings/python/flux/utils/parsedatetime/pdt_locales/__init__.py", line 28, in load_locale
2024-03-15T21:46:18.6667082Z     mod = __import__(__name__, fromlist=[locale], level=0)
2024-03-15T21:46:18.6667755Z           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-03-15T21:46:18.6668306Z KeyboardInterrupt
2024-03-15T21:46:18.6668735Z Traceback (most recent call last):
2024-03-15T21:46:18.6669699Z   File "/tmp/flux-TKzLPm/jobtmp-1-fgVXsPm/critical.py", line 4, in <module>
2024-03-15T21:46:18.6670485Z     import flux
2024-03-15T21:46:18.6671167Z   File "/usr/src/src/bindings/python/flux/__init__.py", line 14, in <module>
2024-03-15T21:46:18.6671994Z     import flux.core.handle
2024-03-15T21:46:18.6672772Z   File "/usr/src/src/bindings/python/flux/core/handle.py", line 19, in <module>
2024-03-15T21:46:18.6673451Z     from flux.future import Future
2024-03-15T21:46:18.6673928Z   File "/usr/src/src/bindings/python/flux/future.py", line 16, in <module>
2024-03-15T21:46:18.6674505Z     from flux.util import check_future_error, interruptible
2024-03-15T21:46:18.6675064Z   File "/usr/src/src/bindings/python/flux/util.py", line 43, in <module>
2024-03-15T21:46:18.6675583Z     from flux.utils.parsedatetime import Calendar
2024-03-15T21:46:18.6676228Z   File "/usr/src/src/bindings/python/flux/utils/parsedatetime/__init__.py", line 69, in <module>
2024-03-15T21:46:18.6676917Z     pdtLocales = dict([(x, load_locale(x)) for x in _locales])
2024-03-15T21:46:18.6677351Z                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-03-15T21:46:18.6677981Z   File "/usr/src/src/bindings/python/flux/utils/parsedatetime/__init__.py", line 69, in <listcomp>
2024-03-15T21:46:18.6678670Z     pdtLocales = dict([(x, load_locale(x)) for x in _locales])
2024-03-15T21:46:18.6679077Z                            ^^^^^^^^^^^^^^
2024-03-15T21:46:18.6679760Z   File "/usr/src/src/bindings/python/flux/utils/parsedatetime/pdt_locales/__init__.py", line 28, in load_locale
2024-03-15T21:46:18.6680506Z     mod = __import__(__name__, fromlist=[locale], level=0)
2024-03-15T21:46:18.6680912Z           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-03-15T21:46:18.6681387Z   File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
2024-03-15T21:46:18.6681973Z   File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
2024-03-15T21:46:18.6682548Z   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
2024-03-15T21:46:18.6683261Z   File "<frozen importlib._bootstrap_external>", line 936, in exec_module
2024-03-15T21:46:18.6683849Z   File "<frozen importlib._bootstrap_external>", line 1032, in get_code
2024-03-15T21:46:18.6684466Z   File "<frozen importlib._bootstrap_external>", line 1131, in get_data
2024-03-15T21:46:18.6684889Z KeyboardInterrupt
2024-03-15T21:46:18.6685205Z t2492: Job exited with rc=130 (expecting 137 (128+9))
2024-03-15T21:46:18.6685596Z t2492: Unexpected job exit code 130
2024-03-15T21:46:18.6686289Z Mar 15 21:45:09.222754 broker.err[0]: rc2.0: /usr/src/t/issues/t2492-shell-lost.sh Exited (rc=1) 0.8s
2024-03-15T21:46:18.6686924Z flux-start: 0 (pid 235881) exited with rc=1
2024-03-15T21:46:18.6687583Z Mar 15 21:45:10.892181 broker.err[0]: rc2.0: /usr/src/t/issues/t2492-shell-lost.sh Exited (rc=1) 4.2s
2024-03-15T21:46:18.6688179Z not ok 32 - t2492-shell-lost

grondo avatar Mar 15 '24 23:03 grondo