Error may occur when running several codes at the same time
Hi, @joa-quim,
As the title says, I use GMT.jl frequently on HPC where I have several scripts running at the same time. I sometimes get an error. I cannot reproduce it because it doesn't always occur. However, when I run the script one at a time, I never get an error.
A code is followed, but it doesn't work.
gmtbegin("temp.png")
gmtset(MAP_FRAME_TYPE = "plain", FORMAT_GEO_MAP = "+D")
gmtset(GMT_VERBOSE = "e")
grdimage(grd, R = (110, 120, 10, 20), J = J, B = "afg", C = mycpt)
coast(W = "0.5p", B = "af", G = "white")
colorbar(C = mycpt, D = "jMR+w80%/0.4c+o-0.9c/0c+m+e",
xaxis = (annot = :auto, ticks = :auto),
yaxis = (annot = :auto, label = "$unit"),
conf = (FONT_ANNOT = "16p",))
gmtend()
Some details about the error.
GMT [ERROR]: Shared GMT module not found: coast
ERROR: LoadError: Something went wrong when calling the module. GMT error number = 45
Stacktrace:
[1] error(s::String)
@ Base ./error.jl:35
[2] gmt(::String)
@ GMT ~/.julia/packages/GMT/C3mup/src/gmt_main.jl:165
[3] finish_PS_module(::Dict{Symbol, Any}, ::Vector{String}, ::String, ::Bool, ::Bool, ::Bool)
@ GMT ~/.julia/packages/GMT/C3mup/src/common_options.jl:4461
[4] _coast(cmd0::String, O::Bool, K::Bool, clip::String, d::Dict{Symbol, Any})
@ GMT ~/.julia/packages/GMT/C3mup/src/pscoast.jl:168
[5] #coast#846
@ ~/.julia/packages/GMT/C3mup/src/pscoast.jl:80 [inlined]
[6] coast (repeats 2 times)
@ ~/.julia/packages/GMT/C3mup/src/pscoast.jl:78 [inlined]
I saw the gmtbegin() function calls the resetGMT() function. And resetGMT clears sessions. The above error may be caused by this.
https://github.com/GenericMappingTools/GMT.jl/blob/30478908ca4fe26bed92f5fffea1a897d8ae6e13/src/gmt_main.jl#L1487-L1500
Hi,
I think this is a consequence of GMT.jl not being thread safe (see this brief issue). If you look at the bottom of GMT.jl file you'll see G_API[1] = GMT_Create_Session("GMT", 2, GMT_SESSION_BITFLAGS) and that G_API is used all over the session until gmt_restart() is called. Now the problem of this is that one cannot have two competing processes (GMT modules calls) using the internal structures (in C) pointed by G_API. Well, we might have luck in some cases but that is not something we can trust to work in a general. And because, while active, the internal structs store parameters used by previous calls, I am obliged to start in a clear state whenever gmtbegin is used. PyGMT goes way beyond this and stars a new API for every GMT call.
You may try to convert your scripts to not use the modern mode syntax and then resetGMT() is not called, but the same general restriction of multiple independent access still holds.
But with classic mode you'll have another problem, which the competition for the PostScript file name. This is set by init as
PSname[1] = TMPDIR_USR[1] * "/" * "GMTjl_" * TMPDIR_USR[2] * TMPDIR_USR[3] * ".ps" that in principle could be reset before parallel scripts calls.
The clean solution would be to be able to start a new Julia session for each independent script call.
I don't know if I made this clear. For example, I have 10 julia files and every file includes the above code. I think it is different from the "thread safe" thing. Actually, after removing the clear_sessions in resetGMT, the error no longer occurs and all figures look good.
How do you launch your parallel processes?
Can you try it with resetGMT(false) in gmtbegin? This will not call clear_sessions (Though I think I tried that some time ago).
How do you launch your parallel processes?
I submit all julia scripts to compute nodes of HPC. Then, all scripts run at the same time.
Can you try it with
resetGMT(false)ingmtbegin? This will not callclear_sessions(Though I think I tried that some time ago).
Same as resetGMT() , errors are reported.
I submit all julia scripts to compute nodes of HPC. Then, all scripts run at the same time.
OK, but I need to understand if each run in a different Julia process or they all run in the same one.
Same as resetGMT() , errors are reported.
Don't understand this. By passing the false argument the clear_sessions() call is not executed. So how come that you still get the errors and not when removing the clear_sessions in resetGMT? I don't have access to an HPC to test this myself.
I need to understand if each run in a different Julia process or they all run in the same one.
I run them in different Julia processes. That's why I think it has nothing to do with "thread safe" thing.
Don't understand this. By passing the false argument the clear_sessions() call is not executed.
Yes, and the gmt_restart() is not executed either.
Than, I'm totally lost. If each script starts at each own process than the call resetGMT should be absolutely innocuous (and unnecessary).
Try this in gmtbegin
FirstModern[1] && resetGMT()
It's no longer cleat to me why I had to create that global FirstModern but since its default is false on a clear start, the above should prevent that resetGMT() gets called in that case.
Try this in gmtbegin.
FirstModern[1] && resetGMT()
Errors are still reported.
I have a new mechanism in #master that should not call resetGMT in gmtbegin when it's run for the first time. But given how past attempts I'm not very expectant that this solves your issue, but ... try it.