parallel icon indicating copy to clipboard operation
parallel copied to clipboard

How to share memory accross threads ?

Open camille-chelpi opened this issue 1 year ago • 7 comments

How to share an object/array across threads without coping it accross a channel/futur in order to minimise memory usage ? With pthreads (php 7.2), we could use Threaded but impossible to do it with Paralell ?

<?php

$big_storage_elt = array();

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index,$shared_array){

        $shared_array[$index] = 'value '.$index;

        return true;

    },array($i,$big_storage_elt));
}

//waiting end of each thread
foreach($threads as $thread) $thread->value();

print_r($big_storage_elt);

I would like to see as result:

Array
(
    [0] => value 0
    [1] => value 1
    [2] => value 2
    [3] => value 3
    [4] => value 4
)

camille-chelpi avatar Dec 02 '24 11:12 camille-chelpi

Hey @camille-chelpi 👋

you are correct, this won't work anymore. You can read about the main change in philosophy from ext-pthreads to ext-parallel at https://www.php.net/manual/de/philosophy.parallel.php

One of the reasons is also in your code. When accessing $shared_array[$index] in the threads, you need some way to synchronise multiple threads accessing the shared memory, otherwise you would get at least a race-condition (as long as operations are atomic) or undefined behaviour. Generally we'd try to avoid both 😉

Hope this helps.

realFlowControl avatar Dec 02 '24 12:12 realFlowControl

I have this solution but it's not nice at all because I'm can't manage the concurence beetwen the unserialize() & serialize() inside a thread and it also mutliply the memory used by the number of thread

<?php

$big_storage_elt = new \Parallel\Sync(serialize(array()));

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index,$shared_array){

        $datas = unserialize($shared_array->get());
        $datas[$index] = 'value '.$index;
        $shared_array->set(serialize($datas));

        return true;

    },array($i,$big_storage_elt));
}
foreach($threads as $thread) $thread->value();

print_r(unserialize($big_storage_elt->get()));

camille-chelpi avatar Dec 02 '24 12:12 camille-chelpi

IDK the problem you are solving, but this works:

<?php

$big_storage_elt = [];

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index){
        return [$index, 'value '.$index];

    },[$i]);
}

//waiting end of each thread
foreach($threads as $thread) {
    list($k, $v) = $thread->value();
    $big_storage_elt[$k] = $v;
}

print_r($big_storage_elt);

realFlowControl avatar Dec 02 '24 12:12 realFlowControl

Perhaps you would also like to take a look at the code in this repo: https://github.com/realFlowControl/1brc

realFlowControl avatar Dec 02 '24 12:12 realFlowControl

IDK the problem you are solving, but this works:

<?php

$big_storage_elt = [];

$threads = array();
for($i=0;$i<5;$i++)
{
    $thread = new \Parallel\Runtime();
    $threads[] = $thread->run(function($index){
        return [$index, 'value '.$index];

    },[$i]);
}

//waiting end of each thread
foreach($threads as $thread) {
    list($k, $v) = $thread->value();
    $big_storage_elt[$k] = $v;
}

print_r($big_storage_elt);

Yes but it's cost much more memory and delay using futur & channel when you deal with huge array.

camille-chelpi avatar Dec 02 '24 12:12 camille-chelpi

It should be perfect if \Parallel\sync can have a function like this that could handle array as value also. public function set($key,$value); public function get($key);

something to study I guess

camille-chelpi avatar Dec 02 '24 12:12 camille-chelpi

I get your point and I'd like to learn more about the issues. Like how big will the $big_storage_elt array grow (in size in bytes / number of entries / ...). I do get, that the data each thread returns in the 1brc code is not too much, even as the input file is 13 GB in size. Have you run a profiler to see how much the impact of copying the memory from one thread to another and combining intermediate results is? Like do you hit memory limits of your machine/container?

realFlowControl avatar Dec 02 '24 12:12 realFlowControl