ClusterODM Feature request: Improve re-queuing logic

Feature request: Improve re-queuing logic

Open smathermather opened this issue 3 years ago • 3 comments

With split merge, in the current implementation, available nodes are filled, but then don't get queried again for availability when done processing. If the ClusterODM node after it's done downloading a completed job would check send cached jobs out to now freed nodes, jobs where submodel # increases nodes could run substantially faster.

Apr 19 '21 01:04 smathermather

For jobs that have substantially more submodels than processing nodes, I have taken to killing the processing when I get down to just a few submodels running the OpenSfM stage, and then restarting it at the OpenSfM stage. The process detects which submodels are complete, and which are still to run, and then sends out jobs to all the available nodes, thus filling up the queue again. It's hackerish, messy, and a volatile and dangerous way to do things, but it gets the job done.

Apr 19 '21 02:04 smathermather

but then don't get queried again for availability when done processing

This is strange; in any case, the logic at fault is probably not in ClusterODM, but in the LRE module in ODM. The LRE should take care of queuing tasks (fill up all available slots), then wait until slots become available.

Would be good to document a test case (with a small dataset) that can be reproduced easily on a development machine.

Apr 19 '21 04:04 pierotofy

You know I don't have any small datasets!

In all seriousness, I think all that's needed to replicate is to set the split settings on any dataset to be such that the number of submodels exceeds the number of nodes, and probably easiest to observe if the number of submodels is roughly twice the number of nodes, as then it's easiest to observe the trailing 1 or 2 submodels.

And if you want me to open on ODM instead, I seem to forget the LRE logic is there, and ClusterODM tries to be pretty agnostic.

Apr 19 '21 11:04 smathermather

ClusterODM ClusterODM copied to clipboard

Feature request: Improve re-queuing logic

ClusterODM
ClusterODM copied to clipboard