cwltool icon indicating copy to clipboard operation
cwltool copied to clipboard

use max_cores in taskQueue instead of system cores

Open kmavrommatis opened this issue 1 year ago • 2 comments

The class TascQueue accepts an argument thread_count that is used for the max size of a queue. In the MultithreadedJobExecutor class there is a variable defined (max_cores) that is getting its value from the available cores of the machine. Further in the same class when TaskQueue is called instead of using the max_cores it is using psutil.cpu_count(). I propose to use the max_cores variable when initializing the TaskQueue.

Use case. one can override the max_cores in the MutlithreadedJobExecutor after initialization and force cwltool to use only up to a certain number of cores.

kmavrommatis avatar Sep 06 '24 18:09 kmavrommatis

Hello, is there any update on this? Does this modification work - to your knowledge - with the overall architecture of the tool?

kmavrommatis avatar Oct 23 '24 12:10 kmavrommatis

hi @kmavrommatis, I'm taking a look at it.

It should already be the case that it will not start more than max_cores worth of work, because MultithreadedJobExecutor separately accounts for resource (cores/RAM) usage when deciding whether to put a job into the task queue. So if max_cores < cpu_count() I don't think it would make a difference in how many parallel jobs it starts. Although if you are seeing something different happen that your pull request addresses, please provide more information.

On the other hand if you want to start more than cpu_count() worth of work (i.e. max_cores > cpu_count()), then the number of parallel processes would be limited by the thread count being set to cpu_count() instead of the greater value of max_cores.

That said, it would be a useful feature to add command line options like --max-cores and --max-ram that would let you set your own limits instead of the defaults that are chosen for you. Is that the feature you are looking for?

tetron avatar Oct 23 '24 18:10 tetron

Hi @tetron my use case requires to use more cores than the ones that are physically available on the system (because the tasks running in the pipeline use the CPU very sparingly while waiting for network signals). I have modified cwltool to allow to specify the number of max cores (as you suggest) in both places in the TaskQueue and MultithreadedJobExecutor to a number that is higher than the actual cpu count (in my typical case it is 256 on a 32 core instance. Adding the arguments you suggest would work - provided that it is allowed to specify higher number of cores than the ones physically available to the instance :) Thanks

kmavrommatis avatar Nov 05 '24 21:11 kmavrommatis

@kmavrommatis just to confirm, are you requesting a fractional core in your CWL file? e.g. minCores: 0.125 ? Then it would just need to no longer limit the number of processes it launches to cpu_count().

tetron avatar Nov 05 '24 21:11 tetron

Hey @kmavrommatis , thank you for opening this PR.

Why do you want to run so many CWL Processes at the same time? Are they very short processes and you want to launch many to avoid delays from the setup and tear-down? Or are they IO-bound, so by over-subscribing the system it is still faster to be processing other jobs while some are waiting for IO?

Ah, I see that you are working with cwl-tes. I think you want to issue any many jobs as possible, because all of them are executed elsewhere.

Okay, this PR makes more sense to me now. It should produce identical behavior for everyone, except those that override ‎MultithreadedJobExecutor.max_cores to some other value.

mr-c avatar Nov 11 '24 07:11 mr-c

Hey @kmavrommatis , thank you for opening this PR.

Why do you want to run so many CWL Processes at the same time? Are they very short processes and you want to launch many to avoid delays from the setup and tear-down? Or are they IO-bound, so by over-subscribing the system it is still faster to be processing other jobs while some are waiting for IO?

Ah, I see that you are working with cwl-tes. I think you want to issue any many jobs as possible, because all of them are executed elsewhere.

Okay, this PR makes more sense to me now. It should produce identical behavior for everyone, except those that override ‎MultithreadedJobExecutor.max_cores to some other value.

Exactly right, cwl-tes is waiting for network events and it can handle many tasks at the same time, so it makes sense to make it assume that it can use many more cores than the host provides. Thanks

kmavrommatis avatar Nov 11 '24 11:11 kmavrommatis

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 84.01%. Comparing base (0b64935) to head (53b73e1). Report is 30 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2038   +/-   ##
=======================================
  Coverage   84.01%   84.01%           
=======================================
  Files          46       46           
  Lines        8302     8302           
  Branches     1957     1957           
=======================================
  Hits         6975     6975           
  Misses        853      853           
  Partials      474      474           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Nov 11 '24 12:11 codecov[bot]

A new cwltool release has been made with your fix; thank you @kmavrommatis !

https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20241112140730 / https://github.com/common-workflow-language/cwltool/releases/tag/3.1.20241112140730

mr-c avatar Nov 12 '24 15:11 mr-c