oneTBB icon indicating copy to clipboard operation
oneTBB copied to clipboard

Provide a configurable/programatic way to limit the size of the global market

Open boldbyteboss opened this issue 5 years ago • 8 comments

Some of our customers use our software on very large "many core" systems (e.g. 900 cores), and typically set these machines up as shared resources for multiple users. These customers have an interest in limiting the amount of resources used by each instance of our application so that they can support as many users as possible and so that users cannot inadvertently overcommit the system.

Our application provides configuration that allows users to limit the amount of parallelism in these situations. Under the hood, we implement this by using global_control in order to restrict concurrency in the parts of our application that use TBB. This works as expected.

However, global_control does not limit the number of threads created by the global market during initialization. The upper limit on the market appears to be computed entirely based on the hardware resources of the machine, with a hard-coded cap at 256. This means that on these "many core" systems, our application wastes a considerable amount of time and resources creating 256 threads for the market, even if we impose a much more modest limit via global_control. Our customers are understandably concerned about the resource consumption and are looking to us for a way to reign in thread creation.

There does not currently appear to be a way to do this with TBB. global_control and task_arena provide soft limits, but those only limit actual concurrency, not thread creation or resource usage. It looks like task_scheduler_init might limit the size of the market, but only if it is created before anything else happens--and it's deprecated. We've also found our application to be too large and complex to reliably use task_scheduler_init.

Perhaps I missed something hidden?

boldbyteboss avatar Jul 28 '20 20:07 boldbyteboss

The market should not create threads that were never requested. The hard limit specifies the number of allocated structures for the threads but threads are not actually created. Threads created when some parallelism is requested (e.g. with parallel_for). These requests respect global_control value and the number of actually created threads should not exceed the soft limit. Do you observe another behavior? Are the threads continue to be created during the application run? Another approach are process masks, you can use taskset or similar functionality to limit CPU resources for each process.

alex-katranov avatar Jul 29 '20 07:07 alex-katranov

I am pretty sure the behavior we are observing is that the full compliment of threads is being created in the market at the time the scheduler is first initialized. I will go revisit this and confirm.

We have suggested operating-system level mechanisms such as taskset but most of our users are not the system administrators or balk at having to make operating-system level configuration changes for what they see as something we should just let them turn a knob. And to be fair, taskset is doing more than simply limiting CPU resources--it's also pinning the application to specific processors.

boldbyteboss avatar Jul 29 '20 11:07 boldbyteboss

I am pretty sure the behavior we are observing is that the full compliment of threads is being created in the market at the time the scheduler is first initialized. I will go revisit this and confirm.

Is it possible that some parallel algorithms are used before the global control is set?

alex-katranov avatar Jul 29 '20 13:07 alex-katranov

I am pretty sure the behavior we are observing is that the full compliment of threads is being created in the market at the time the scheduler is first initialized. I will go revisit this and confirm.

@tmwnewbold , have you had a chance to look whether more threads are created than expected?

alex-katranov avatar Dec 16 '20 19:12 alex-katranov

@tmwnewbold is this issue still relevant for you? Could you please respond?

isaevil avatar Oct 05 '22 13:10 isaevil

I'm now mostly convinced that global_control does what we need -- so long as it is constructed prior to any usage of TBB. We're suffering from a somewhat sprawling architecture that has made it extremely difficult to ensure that the construction of global_control takes place before any usage. Auto-initialization is great, but it makes it difficult to maintain the level of control we'd like across a large application.

boldbyteboss avatar Oct 05 '22 13:10 boldbyteboss

@alexey-katranov is this issue is still relevant?

arunparkugan avatar Aug 13 '24 11:08 arunparkugan

I think we have it under control (barely) using global_control at this point. It works as advertised; our challenge is continuing to ensure we construct it soon enough before any parallelism occurs.

boldbyteboss avatar Aug 13 '24 11:08 boldbyteboss