Streams icon indicating copy to clipboard operation
Streams copied to clipboard

Multithreaded Streams

Open jscheiny opened this issue 11 years ago • 2 comments
trafficstars

Allow parallelized streams (with a more limited set of operators).

jscheiny avatar Sep 04 '14 19:09 jscheiny

Great! Jonah! Can't wait for the parallelised streams :+1: :)

oldboldpilot avatar Sep 16 '14 20:09 oldboldpilot

I think your job with this project is great!

Parallelism are one obvious next big thing for a functional framework since one of the biggest selling points of functional programming is to give to developers a robust and cheap access to parallel processing.

The reasons OOProgramming is loosing the 'holiness aura' it used to have in the past years are:

  • Objects are often state-full and this don't play well with multi-threading, you often need to create a lot of interlocks. And CPUs stopped evolving vertically since a while, but are just evolving horizontally. We actually have machines with 20 real cores (40 thread cores) that most of the time starves asking for data.
  • Classes tend to mix data and logics and this don't play well with distributed projects where the data need to be transmitted (and saved) efficiently again and again. Network cables have evolved much slowly comparing to CPUs. We still have Gigabit cables, 10Gb is the standard, nothing compared to how much have the CPU evolved.
  • The number of users one service have to serve just exploded far beyond the capabilities of a single machine, before we have at most 10000K users, now hundred millions and this thanks to the mobile revolutions, social networks, internet of things....

Here come the big U-turn of many developers about OO, including myself.

Do you have ideas how to implement with simplicity multi threading?

What about the following?

// This sample takes an huge amount of words, and put them into a normalized, sorted and deduplicated vector:
myWords = load_from_somewhere();

auto my_normalize = [ ](const std::sting& word) -> std::string {
    ...
    return normalizedWord;
}; 
auto maxWorkers = 4;     // Takes up to 4 CPUs
auto bucketSize = 10000; // Every internal parallel task will have 10k elements.

auto myVucabulary = myWords | parallel(map_(my_normalize) | distinct(),
                                       numWorkers, bucketSize)
                            | distinct()
                            | to_vector();

If you want I can help you with or we can work together and design the best solution.

rressi avatar May 23 '15 05:05 rressi