Robert Carson

Results 60 comments of Robert Carson

@jameshcorbett so I'm currently testing my workflow / optimization tool on Summit and am running into some issues on the flux side of things. So, my python flux driver looks...

@jameshcorbett that seemed to work, and I guess I completely missed that I'd already declared the flux handle as fh when introducing some new code. I'm now running into an...

@grondo that seemed to help. The jobs are starting to run but then the fail immediately or hang as it appears the Spectrum MPI isn't being picked up in the...

@jameshcorbett okay I think I figured out the issue. It looks at some point I had added `flux-core` as a module to load in my job script as it appeared...

Thanks @grondo that appears to work quite well for my needs.

@jameshcorbett so as I'm working with an optimization problem the python workflow script can take a bit to finish running even for the small tests. It turns out that eventually...

@jameshcorbett so I just noticed that I'm seeing a bunch of `core.*` files generated with a number of the flux runs which doesn't appear if I just do a `jsrun`...

@grondo so after poking at the core files using `ARM Forge`, it appears that the failure is in the `darshan-core` file which is called during `MPI_Finalize`. Particularly the output looks...

The error code returned from the program is 139 which definitely suggests that it's a `SIGSEGV`. It's still not clear to me what's causing it other than maybe some issue...

@grondo and @jameshcorbett so I was finally able to get the workflow that was driving this out onto ExaConstit's repo after working through the details to preserve the git history...