flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

cron module stuck

Open grondo opened this issue 5 months ago • 0 comments

The cron module was stuck outside the reactor on a system. perf reported 99% of the broker in cronodate_next

  - cronodate_next                                                            ▒
      - 91.37% __GI_timelocal (inlined)                                        ▒
         - 90.96% __tzset                                                      ▒
            - 90.49% tzset_internal                                            ▒
               - 88.44% __tzfile_read                                          ▒
                  - 87.73% _xstat (inlined)                                    ▒
                     + 82.57% entry_SYSCALL_64_after_hwframe                   ▒
                       2.53% srso_alias_safe_ret                               ▒
                       0.81% entry_SYSCALL_64                                  ▒
                       0.66% ___might_sleep                                    ▒
                 0.77% __GI_getenv (inlined)                                   ▒
      + 3.25% __mktime_internal                                                ▒
      + 2.93% idset_test                                                       ▒

This is called by cronodate_remaining() when libev calls the cron datetime reschedule_cb.

I was able to attach to the broker and step through this a bit (before fumbling and killing the broker) and found the code was processing this cronodate entry:

(gdb) p *d
$7 = {item = {{set = 0x7fff4400e170, encoding = 0x7fff440121f0 "0"}, {
      set = 0x7fff44008c60, encoding = 0x7fff44012600 "0"}, {
      set = 0x7fff44013130, encoding = 0x7fff44013210 "2"}, {
      set = 0x7fff44004570, encoding = 0x7fff44013620 "1-31"}, {
      set = 0x7fff4400f3d0, encoding = 0x7fff44013a30 "0-11"}, {
      set = 0x7fff4400df70, encoding = 0x7fff44013e40 "0-1100"}, {
      set = 0x7fff4400f7f0, encoding = 0x7fff44014250 "1-5"}}}

This entry did seem to be stuck in cronodate_next() though stepping through was a bit confusing, the struc tm seemed to be bouncing between two dates. Perhaps a corner case in tm_advance() (unfortunately I let the gdb output scroll out of my buffer)

grondo avatar Sep 16 '24 21:09 grondo