FiberTaskingLib icon indicating copy to clipboard operation
FiberTaskingLib copied to clipboard

Built-in parallell algorithms?

Open vblanco20-1 opened this issue 5 years ago • 4 comments

Ive been working with this library (works great!) on some personal project, and ive implemented a few parallel algorithms like parallel for. Is there interest on making that a pull request? Im mostly interested on what kind of API would work, to adapt my current algos to a style closer to the library.

An example is parallel for. Mine works a bit like this.

std::vector<float> nums= whatever();
 ftl::TaskScheduler taskScheduler;
//runs lambda, and the inputs to it are integer index + container reference. Second number is batch size
Parallel_For(&taskScheduler, 64, nums , [](int i, auto c ){  c[i] *= 2.0f;   }   );
//it actually waits in-line

A possible implementation would be to use the new parallel std::algorythms, in wich case, it could be possible to implement some of those algos so they use this library internallly.

Std algorythms https://en.cppreference.com/w/cpp/algorithm

The api could look like this: (C++ 17 only)

std::vector<float> nums = whatever();
 ftl::TaskScheduler taskScheduler;

 std::for_each( taskScheduler.parallell_exec   , nums.begin(), nums.end(), [](float&n){ n *= 2; });

Using the c++ parallel algo library could make the library easy to use as it would be easier to port code to use it and provides a template and API.

The last option is to change the api a bit, and make it a function on the task scheduler object, like

taskScheduler->for_each(nums ,64,  [](int i, auto c ){  c[i] *= 2.0f;   }   ););

This option would give more flexibillity about what algorythms are implemented, and its more freeform as it doesnt use the stl api. Similar with other things like sort (wildly useful for everyone) wich im implementing myself, or even the typical producer/consumer pipeline (have 2 lambdas, one for producers, and other for consumers, and have the return of the first one go into a parallel queue and then consumers grab it).

If parallel algos are added, i think a way to control their allocation would be nice. A common thing that could be done is to give the algo function a memory block, and just use linear allocation to allocate the tasks, then reset it when done. Could be a separate function declaration.

vblanco20-1 avatar Oct 06 '18 16:10 vblanco20-1

ParallelFor could be a really nice addition to the library.

I think the first method would be my choice. Something like: https://coliru.stacked-crooked.com/a/7b6b490a73d202bb

I'm not sure how we could make std::for_each work, since we need to pass in the ftl::TaskScheduler * pointer, but the std::for_each function signature doesn't allow it. In addition, I would like to keep the library C++11. The new version of std::for_each is C++17.

As a slight tangent, @BrodyHiggerson was also looking into an implementation, if you want to swap notes.

RichieSams avatar Oct 08 '18 19:10 RichieSams

@RichieSams Thanks for the answer. At the moment i have a very simple implementation here https://github.com/vblanco20-1/Rasterizer/blob/master/Rasterizer/ParallelFor.h wich is very similar to what you comment there. In my implementation, i found that it was very useful to actually not block directly, and thats why i return a "task end" struct, wich you can use to wait on the counter and that removes the temporal memory when the destructor is called.

Example from my project.


TaskEnd<ParallelForTaskData> first_par = Parallel_For(taskScheduler
				, 0, g_Framebuffer->Tiles.size(), 1, 
                                [&](auto i) {
				g_Framebuffer->DrawTile(&g_Framebuffer->Tiles[i]);
			});
			end->Wait(0, true);
first_par.Wait(0, true);

I use the "non-blocking" launch so i can launch multiple parallel-fors at once, or to launch a second one when the counter of the first is at 50%

I like your example more than mine, but doesnt lambda capture already do that?

vblanco20-1 avatar Oct 08 '18 20:10 vblanco20-1

Lambda capture can get super hairy, since scope becomes very wishy washy between parents and children. In theory, I think lambda capture should work "fine". I would defer to someone like @martty though.

RichieSams avatar Oct 22 '18 21:10 RichieSams

Scope will still work as long as you wait for parents to terminate before you bail from your task . Having a future system with RAII counter waiting will solve that problem.

cwfitzgerald avatar Oct 23 '18 02:10 cwfitzgerald