dispy
dispy copied to clipboard
Misc fixes
- Connecting to dispyscheduler from the client (SharedJobCluster) with a keyfile/certfile didn't work.
- Why did dispynode block PING messages from the scheduler IP? I've removed the check.
- When the job submit rate was high enough, the _job object was reused, causing a clash of uid's. I've changed this particular case to using uuid.uuid4().hex, but you might want to go over the other cases and do something similar. Also, using os.urandom as a "hash" seems wrong. Better to calculate it with hash() if a hash is actually needed, or generate a UUID.
- When stdin was /dev/null, e.g. when started as a systemd service unit, the sys.stdin.readline() in the main loop caused a 100 % CPU busy loop.
I have fixed these issues but not ready to commit yet, as it turns out setting up SSL across dispy, dispynode and dispyscheduler had too many problems (although I have tested these at the time these features were added, over time these were not handled correctly). I will test them again and commit when I have reasonable confidence.
I would be very much in favour of patch number 3, because these uid clashes seem likely to happen when having many jobs, not all created at the same time.
I believe current implementation shouldn't cause duplicate UIDs, as jobs that are done are kept in 'done_jobs' until they are no longer used anywhere (all user callbacks are called). There are couple of (minor) issues with uuid. Creating 'id' is much cheaper than using uuid (I believe in versions of Python uuid can be quite expensive - I should look up again for reference), and using 'id' to store them in dictionary is cheaper than uuid strings (which would be converted to number with hashing). In case of issues with dispy, it is easy to check logs with 'id' numbers than uuid numbers.
However, if there is any issue with current implementation that is solved with uuid, we should definitely reconsider it.