Nameserver keeps 'old' agents which died or got killed
The nameserver currently keeps agents in his database/storage even when they have died or been killed (with SIGTERM for example).
I have the current workaround implemented:
from osbrain import run_nameserver
import time, sys, traceback
NAMESERVER='10.10.10.42:42422'
def main():
defunct_agents = {}
ns = run_nameserver(NAMESERVER)
while True:
for alias in ns.agents():
try:
agent = ns.proxy(alias)
agent.ping()
except:
if alias in defunct_agents:
defunct_agents[alias] += 1
else:
defunct_agents[alias] = 1
for agn in list(defunct_agents):
if defunct_agents[agn] > 4:
try:
ns.remove(agn)
defunct_agents.pop(agn, None)
except:
traceback.print_exc(file=sys.stdout)
time.sleep(5)
if __name__ == '__main__':
main()
A optional argument to ns_runserver (ns_runserver(NAMESERVER, del_defunct_agents_after_seconds=60) for example) would be great.
Thanks for opening this issue.
Although we have that in our road-map, heartbeating and monitoring will probably be left for a post-1.0.0 release. Meanwhile is up to the user to implement any tricks they need for their application. :wink:
Anyway we will probably implement better handling of SIGINT/SIGTERM for 1.0.0, though. The case in which the agent simply "dies" will still need to be handled by the user.