Robert Carson
Robert Carson
@jameshcorbett so I'm currently testing my workflow / optimization tool on Summit and am running into some issues on the flux side of things. So, my python flux driver looks...
@jameshcorbett that seemed to work, and I guess I completely missed that I'd already declared the flux handle as fh when introducing some new code. I'm now running into an...
@grondo that seemed to help. The jobs are starting to run but then the fail immediately or hang as it appears the Spectrum MPI isn't being picked up in the...
@jameshcorbett okay I think I figured out the issue. It looks at some point I had added `flux-core` as a module to load in my job script as it appeared...
Thanks @grondo that appears to work quite well for my needs.
@jameshcorbett so as I'm working with an optimization problem the python workflow script can take a bit to finish running even for the small tests. It turns out that eventually...
@jameshcorbett so I just noticed that I'm seeing a bunch of `core.*` files generated with a number of the flux runs which doesn't appear if I just do a `jsrun`...
@grondo so after poking at the core files using `ARM Forge`, it appears that the failure is in the `darshan-core` file which is called during `MPI_Finalize`. Particularly the output looks...
The error code returned from the program is 139 which definitely suggests that it's a `SIGSEGV`. It's still not clear to me what's causing it other than maybe some issue...
@grondo and @jameshcorbett so I was finally able to get the workflow that was driving this out onto ExaConstit's repo after working through the details to preserve the git history...