James Corbett

Results 80 comments of James Corbett

Olaf Faaland wants to make use of prologue and epilogue scripts on Elmerfudd to: > (1) Run a script to clean up /dev/shm after a job, so that a user...

The rabbit setup which motivated this discussion goes something like this: 1. User's executable finishes 2. Flux tells DWS (via kubernetes API) to unmount rabbit FS's on compute nodes 3...

I am wondering if there might be some trickery involved from the fact that jobs will have resources (rabbits) that aren't associated with nodes? For instance there will be cases...

@grondo and I talked about it on the coffee call and he proposed putting in partial job R to the free request rather than an idset. He noted that it...

There is an additional complication, which is that Flux can technically alert the user that their job has completed before the last condition has been reached (that the rabbit file...

> If the FS clean-up hangs, there wouldn't be any data loss, the user just wouldn't know that. Maybe a solution could be that once the rabbit software tells us...

> I might be misunderstanding, but the job manager should not issue the `clean` event and the job would not go into the INACTIVE state until _all_ resources have been...

I like the eventlog approach as well! I have the code to make use of it whenever it could be implemented.

@vsoch that wasn't my words, I was quoting from and summarizing a variety of emails, and the formatting got a bit messed up. Options 1-4 should have been options 1,...

@wihobbs established basic support for Flux within ATS in https://github.com/LLNL/ATS/pull/89. There is still more that could be done, but I think they are in a pretty good place.