flux-core icon indicating copy to clipboard operation
flux-core copied to clipboard

overly verbose cleanup messages after allocation expired

Open grondo opened this issue 1 year ago • 1 comments

Got this verbose set of messages after an allocation expiration of a flux alloc job on 36 nodes. Hm, maybe this is only so bad because the job happened to time out during rc3, but it seems to be fairly reproducible with flux alloc -N32 on fluke with v0.55.0:

1801.333s: job.exception type=timeout severity=0 resource allocation expired
Oct 04 14:51:47.398448 broker.err[4]: runat_abort cleanup (signal 14): No such file or directory
Oct 04 14:51:47.398544 broker.err[5]: runat_abort cleanup (signal 14): No such file or directory
Oct 04 14:51:47.401764 broker.err[23]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.401720 broker.err[25]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402064 broker.err[24]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.401094 broker.err[21]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.401130 broker.err[19]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402151 broker.err[26]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402505 broker.err[28]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402332 broker.err[27]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.401281 broker.err[20]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.401274 broker.err[22]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.401013 broker.err[18]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402888 broker.err[29]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402893 broker.err[30]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.402789 broker.err[33]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.403173 broker.err[31]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.403351 broker.err[32]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.403517 broker.err[34]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.403700 broker.err[35]: rc3.0: /etc/flux/rc3 Hangup (rc=129) 0.0s
Oct 04 14:51:47.423834 broker.err[0]: rc2.0: /bin/bash Hangup (rc=129) 1783.1s
broker: broker module 'resource' was not properly shut down
broker: broker module 'kvs' was not properly shut down     
broker: broker module 'job-info' was not properly shut down
broker: broker module 'kvs' was not properly shut down     
broker: broker module 'resource' was not properly shut down
broker: broker module 'content' was not properly shut down 
broker: broker module 'barrier' was not properly shut down
broker: broker module 'barrier' was not properly shut down
broker: broker module 'kvs' was not properly shut down    
broker: broker module 'content' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'resource' was not properly shut down 
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'job-info' was not properly shut down 
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs' was not properly shut down      
broker: broker module 'barrier' was not properly shut down
broker: broker module 'resource' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'kvs' was not properly shut down       
broker: broker module 'content' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'resource' was not properly shut down  
broker: broker module 'content' was not properly shut down 
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'content' was not properly shut down   
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'barrier' was not properly shut down   
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'resource' was not properly shut down  
broker: broker module 'kvs' was not properly shut down     
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'resource' was not properly shut down 
broker: broker module 'resource' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs' was not properly shut down      
broker: broker module 'resource' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'content' was not properly shut down   
broker: broker module 'resource' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'kvs' was not properly shut down     
broker: broker module 'resource' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'content' was not properly shut down 
broker: broker module 'barrier' was not properly shut down
broker: broker module 'kvs' was not properly shut down    
broker: broker module 'kvs' was not properly shut down
broker: broker module 'barrier' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'job-info' was not properly shut down  
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'content' was not properly shut down   
broker: broker module 'kvs' was not properly shut down    
broker: broker module 'kvs' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'barrier' was not properly shut down   
broker: broker module 'resource' was not properly shut down
broker: broker module 'content' was not properly shut down 
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'content' was not properly shut down  
broker: broker module 'job-info' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'barrier' was not properly shut down   
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'barrier' was not properly shut down  
broker: broker module 'kvs' was not properly shut down    
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'barrier' was not properly shut down   
broker: broker module 'job-info' was not properly shut down
broker: broker module 'barrier' was not properly shut down 
broker: broker module 'kvs' was not properly shut down    
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs' was not properly shut down      
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'resource' was not properly shut down  
broker: broker module 'job-info' was not properly shut down
broker: broker module 'barrier' was not properly shut down 
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs' was not properly shut down      
broker: broker module 'job-info' was not properly shut down
broker: broker module 'content' was not properly shut down 
broker: broker module 'resource' was not properly shut down
broker: broker module 'resource' was not properly shut down
broker: broker module 'barrier' was not properly shut down 
broker: broker module 'job-info' was not properly shut down
broker: broker module 'barrier' was not properly shut down 
broker: broker module 'job-info' was not properly shut down
broker: broker module 'content' was not properly shut down 
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'barrier' was not properly shut down   
broker: broker module 'content' was not properly shut down
broker: broker module 'job-info' was not properly shut down
broker: broker module 'kvs' was not properly shut down     
broker: broker module 'content' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'resource' was not properly shut down  
broker: broker module 'resource' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'kvs' was not properly shut down      
broker: broker module 'content' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'barrier' was not properly shut down  
broker: broker module 'job-info' was not properly shut down
broker: broker module 'barrier' was not properly shut down 
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'job-ingest' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down 
broker: broker module 'content' was not properly shut down  
broker: broker module 'job-info' was not properly shut down
broker: broker module 'content' was not properly shut down 
broker: broker module 'resource' was not properly shut down
broker: broker module 'kvs-watch' was not properly shut down
broker: broker module 'barrier' was not properly shut down  
broker: broker module 'content' was not properly shut down
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
broker: skipping 0MQ shutdown due to presumed module socket leak
[detached: session exiting]

grondo avatar Oct 04 '23 23:10 grondo

I wonder if the broker could capture the fact that termination is due to a signal here, and skip all the "was not properly shutdown" and "skipping 0MQ shutdown" errors, since this is already an abnormal shutdown?

grondo avatar Oct 05 '23 15:10 grondo