noisepage
noisepage copied to clipboard
Deprecated tbb::task_scheduler_init
The tbb::task_scheduler_init
class is deprecated. Not really a pressing issue, and would not be noticeable if it weren't for the fact that it pops up some warnings during compilation thanks to a #pragma message()
in the tbb
headers.
The Intel documentation does not appear to suggest an alternative to tbb::task_scheduler_init
, however, it appears that we currently only use it for its tbb::task_scheduler_init::default_num_threads()
static member function - that is, we aren't using it to limit the concurrency of parallel operations. According to the documentation, this just returns the number of logical processors available to the current process (may differ from the total number of logical processors in the system, but likely a rare scenario, at least for where NoisePage is running right now) so we could likely replace this with std::thread::hardware_concurrency()
without altering behavior.
That said, it also probably wouldn't prove that difficult to roll our own implementation that takes the affinity mask of the current process into account.
task_scheduler_init
was replaced by task_arena
and global_control
see here: tbbrevamp.pdf.
There are a few places where we call the init, which defines how many threads tbb functions could use (on a really old branch this was causing issues where a benchmark would always use all the threads available to the system). We've already started replacing the task_scheduler_init
calls with task_arena
's (table_vector_iterator.cpp
) but the other calls have not been replaced. On my experimental branches, I've switched to using the num_parallel_execution_threads
instead of default_num_threads()
but I do hardcode these at the moment and override the default number of (1 if settings manager is enabled, -1 otherwise).
@eppingere and I used sched_getaffinity
to get the affinity mask of the current process in our 721 project last year, but unfortunately couldn't find an operation for OSX.
I think osx has one i just think we decided it wasn't worth it or something?
https://developer.apple.com/library/archive/releasenotes/Performance/RN-AffinityAPI/#//apple_ref/doc/uid/TP40006635-CH1-DontLinkElementID_2
Ah yep, we couldn't bind anything to a processor so just didn't proceed with it:
OS X does not export interfaces that identify processors or control thread placement—explicit thread to processor binding is not supported. Instead, the kernel manages all thread placement. Applications expect that the scheduler will, under most circumstances, run its threads using a good processor placement with respect to cache affinity.
I suspect we might need a TBB
refactor and that our database does not quite adhere to parallelization control in a few places. A number of TBB
parallel calls do not immediately make it clear that they are using the correct number of threads per our parallel execution settings (or any other specific value -- otherwise they default, as @turingcompl33t noted to the number of processors available), nor do they seem to be always recorded in metrics (SetNumConcurrentEstimate
). (see: https://github.com/cmu-db/noisepage/blob/fbd927dd8e35bab27a2c7e98568ded71af4a1ea9/src/execution/sql/aggregation_hash_table.cpp#L678, https://github.com/cmu-db/noisepage/blob/fbd927dd8e35bab27a2c7e98568ded71af4a1ea9/src/execution/sql/aggregation_hash_table.cpp#L719 for an example of each).
Indeed, TBB documentation notes:
If a parallel construct is used without task_scheduler_init object previously created, the scheduler will be initialized automatically with default settings, and will persist until this thread exits. Default concurrency level is defined as described in task_scheduler_init::initialize().
A discussion is needed, but I would propose that we rid ourselves of all calls to task_scheduler_init
and we wrap each TBB
parallel construct in a task_arena
which inherits the setting value with a method to be determined (possibly OSX/Linux macros and an appropriate affinity mask call). This does leave some questions (regarding module.cpp
off the top of my head see: tbb::task
), which uses TBB
task calls that I'm not sure utilize parallelism, but it might not be possible to visit settings at this point.