cgru icon indicating copy to clipboard operation
cgru copied to clipboard

do not just check for currently available RAM on task start

Open ultra-sonic opened this issue 2 years ago • 1 comments

Hi Timur,

we are currently trying to make our farm more efficient by running 2 or more tasks per node in parallel. The main issue we are facing right now is how afanasy checks if the render-node has enough RAM left to run the task. in our case the memory needed by renderjobs always increases over time so just checking at the start of a new task is not safe. we have implemented an out-of-memory check in the parser that kills tasks once they exceed their memory limit. (and if there is no more RAM left on the system)

I would suggest checking if the the total system memory minus the memory required by all running tasks on the render-node is bigger then the memory needed by the next task that should be assigned. does this make sense?

I think the easiest way would be to change this line https://github.com/CGRU/cgru/blob/69eb55beaaedaf35996face23bb09373c88f5181/afanasy/src/server/block.cpp#L230 to somehing like in this pseudo code:

totalMemNeededForRunningTasks = 0
for task in render->allTasks:
    totalMemNeededForRunningTasks+=task->getNeedMemory()
if (m_data->getNeedMemory() > render->getHostRes().mem_free_mb - totalMemNeededForRunningTasks)

can you turn that pseudo code into real c++ for me and post it here? I could then test this at RISE and see how well it works.

this would turn the neededMemory property for blocks into a "max. memory" which ofc needs to be monitored by afrender (or the parser like we do at RISE already)

ultra-sonic avatar Jul 04 '22 12:07 ultra-sonic

Hi Oliver, You can use tickets for this. Tickets are like "named" capacity types. https://cgru.readthedocs.io/en/latest/afanasy/tickets.html

timurhai avatar Jul 05 '22 08:07 timurhai