FiberTaskingLib
FiberTaskingLib copied to clipboard
Built-in parallell algorithms?
Ive been working with this library (works great!) on some personal project, and ive implemented a few parallel algorithms like parallel for. Is there interest on making that a pull request? Im mostly interested on what kind of API would work, to adapt my current algos to a style closer to the library.
An example is parallel for. Mine works a bit like this.
std::vector<float> nums= whatever();
ftl::TaskScheduler taskScheduler;
//runs lambda, and the inputs to it are integer index + container reference. Second number is batch size
Parallel_For(&taskScheduler, 64, nums , [](int i, auto c ){ c[i] *= 2.0f; } );
//it actually waits in-line
A possible implementation would be to use the new parallel std::algorythms, in wich case, it could be possible to implement some of those algos so they use this library internallly.
Std algorythms https://en.cppreference.com/w/cpp/algorithm
The api could look like this: (C++ 17 only)
std::vector<float> nums = whatever();
ftl::TaskScheduler taskScheduler;
std::for_each( taskScheduler.parallell_exec , nums.begin(), nums.end(), [](float&n){ n *= 2; });
Using the c++ parallel algo library could make the library easy to use as it would be easier to port code to use it and provides a template and API.
The last option is to change the api a bit, and make it a function on the task scheduler object, like
taskScheduler->for_each(nums ,64, [](int i, auto c ){ c[i] *= 2.0f; } ););
This option would give more flexibillity about what algorythms are implemented, and its more freeform as it doesnt use the stl api. Similar with other things like sort (wildly useful for everyone) wich im implementing myself, or even the typical producer/consumer pipeline (have 2 lambdas, one for producers, and other for consumers, and have the return of the first one go into a parallel queue and then consumers grab it).
If parallel algos are added, i think a way to control their allocation would be nice. A common thing that could be done is to give the algo function a memory block, and just use linear allocation to allocate the tasks, then reset it when done. Could be a separate function declaration.
ParallelFor could be a really nice addition to the library.
I think the first method would be my choice. Something like: https://coliru.stacked-crooked.com/a/7b6b490a73d202bb
I'm not sure how we could make std::for_each
work, since we need to pass in the ftl::TaskScheduler *
pointer, but the std::for_each
function signature doesn't allow it. In addition, I would like to keep the library C++11. The new version of std::for_each
is C++17.
As a slight tangent, @BrodyHiggerson was also looking into an implementation, if you want to swap notes.
@RichieSams Thanks for the answer. At the moment i have a very simple implementation here https://github.com/vblanco20-1/Rasterizer/blob/master/Rasterizer/ParallelFor.h wich is very similar to what you comment there. In my implementation, i found that it was very useful to actually not block directly, and thats why i return a "task end" struct, wich you can use to wait on the counter and that removes the temporal memory when the destructor is called.
Example from my project.
TaskEnd<ParallelForTaskData> first_par = Parallel_For(taskScheduler
, 0, g_Framebuffer->Tiles.size(), 1,
[&](auto i) {
g_Framebuffer->DrawTile(&g_Framebuffer->Tiles[i]);
});
end->Wait(0, true);
first_par.Wait(0, true);
I use the "non-blocking" launch so i can launch multiple parallel-fors at once, or to launch a second one when the counter of the first is at 50%
I like your example more than mine, but doesnt lambda capture already do that?
Lambda capture can get super hairy, since scope becomes very wishy washy between parents and children. In theory, I think lambda capture should work "fine". I would defer to someone like @martty though.
Scope will still work as long as you wait for parents to terminate before you bail from your task . Having a future system with RAII counter waiting will solve that problem.