OMJulia.jl
OMJulia.jl copied to clipboard
ZMQ freezes on first command after session is created
Sometimes (quite rarely), the first call to sendExpression() after an OMCSession is created freezes.
Stacktrace of InterruptException (after CTRL-C):
[1] wait(::FileWatching._FDWatcher; readable::Bool, writable::Bool) at /build/julia/src/julia-1.5.0/usr/share/julia/stdlib/v1.5/FileWatching/src/FileWatching.jl:529
[2] wait at /home/cslz90/.julia/packages/ZMQ/R3wSD/src/socket.jl:52 [inlined]
[3] _recv!(::ZMQ.Socket, ::ZMQ.Message) at /home/cslz90/.julia/packages/ZMQ/R3wSD/src/comm.jl:75
[4] recv at /home/cslz90/.julia/packages/ZMQ/R3wSD/src/comm.jl:94 [inlined]
[5] sendExpression(::OMJulia.OMCSession, ::String) at /home/cslz90/.julia/packages/OMJulia/ZLXEs/src/OMJulia.jl:1014
[6] setupOMCSession(::String, ::String; quiet::Bool, checkunits::Bool) at /home/cslz90/.julia/packages/ModelicaScriptingTools/G5LLK/src/ModelicaScriptingTools.jl:374
setupOMCSession is my own code which contains the following relevant lines with the second line being the one that shows up in the stacktrace:
omc = OMCSession()
sendExpression(omc, "cd(\"$(moescape(outdir))\")")
This happens with the release version 0.1.0 of OMJulia. I believe I have also encountered it with the current version from the master branch in the past, but I cannot confirm that since I have switched back to the official released version some time ago.
I will try to introduce a sleep for 100ms between the creation of the Session and the first sendExpression() and report back whether this workaround is successful.
One additional note: Together with #32 one might get the impression that perhaps any sendExpression() call might freeze, but across several hundred test runs over the last months, I never encountered a freeze between individual simulations, but only at the very start or at the end of the pipeline.
Update: I gradually increased the timeout from 100 ms to 500 ms, but still got occasional hangups. My next best guess is this suggestion from a related issue in ZMQ.jl: https://github.com/JuliaInterop/ZMQ.jl/issues/87#issuecomment-131153884
function avoidStartupFreeze(omc:: OMCSession)
status = :started
timeout = 0.1
while status != :received
# send a simple command to OMC
send(omc.socket, "getVersion()")
# use julia task to allow recv to run into a timeout
c = Channel()
@async put!(c, (recv(omc.socket), :received));
@async (sleep(timeout); put!(c, (nothing, :timedout));)
data, status = take!(c)
if status == :timedout
@warn("getVersion() timed out in avoidStartupFreeze")
end
end
end
This sends getVersion() to the OMC until an answer is received in less than 100 ms. I am not sure if this (rather crude) timeout mechanism will work if ZMQ freezes as the issue is not reliably reproducible. I will report back when I encounter a case where the warning message is issued.
Update can be found here: https://github.com/THM-MoTE/ModelicaScriptingTools.jl/issues/9
The solution avoids freezes, but ZMQ crashes with a ZMQ.StateError.
Another update: I have now improved the function avoidStartupFreeze to a point where it simply discards the whole OMCSession and creates a new one when a timeout is detected.
function avoidStartupFreeze(omc:: OMCSession)
function reconnect(omc:: OMCSession)
try
send(omc.socket, "quit()")
catch e
end
return OMCSession()
end
status = :started
timeout = 0.1
while status != :received
# send a simple command to OMC
send(omc.socket, "getVersion()")
# use julia task to allow recv to run into a timeout
# idea from https://github.com/JuliaInterop/ZMQ.jl/issues/87#issuecomment-131153884
c = Channel()
@async put!(c, (recv(omc.socket), :received));
@async (sleep(timeout); put!(c, (nothing, :timedout));)
data, status = take!(c)
if status == :timedout
omc = reconnect(omc)
end
end
return omc
end
So far this works great, although it is more a workaround rather than a solution.
@CSchoel thank you for that workaround. I also observed the startup freeze, but additionally have problems when running thousands of simulations in a row - at some point the communication fails.
@DarkVador42 you're welcome. I am happy that it could be of help to someone else. :smile:
Is your error by any chance related to a ZMQ.StateError? This is the only additional problem that I encountered with this method and it only occurs during the creation of an OMCSession instance. I use a very crude solution for this which just recreates the session until there is no error and up until now it works. :shrug:
@CSchoel, yes, it also happens regularly when I create the OMCSession. Apart from that it also froze when I had thousands of model calls, where it was trapped inside a "wait" function of ZMQ - I cannot be more specific here, since I was not able to reproduce the error...