How to share memory accross threads ?
How to share an object/array across threads without coping it accross a channel/futur in order to minimise memory usage ? With pthreads (php 7.2), we could use Threaded but impossible to do it with Paralell ?
<?php
$big_storage_elt = array();
$threads = array();
for($i=0;$i<5;$i++)
{
$thread = new \Parallel\Runtime();
$threads[] = $thread->run(function($index,$shared_array){
$shared_array[$index] = 'value '.$index;
return true;
},array($i,$big_storage_elt));
}
//waiting end of each thread
foreach($threads as $thread) $thread->value();
print_r($big_storage_elt);
I would like to see as result:
Array
(
[0] => value 0
[1] => value 1
[2] => value 2
[3] => value 3
[4] => value 4
)
Hey @camille-chelpi 👋
you are correct, this won't work anymore. You can read about the main change in philosophy from ext-pthreads to ext-parallel at https://www.php.net/manual/de/philosophy.parallel.php
One of the reasons is also in your code. When accessing $shared_array[$index] in the threads, you need some way to synchronise multiple threads accessing the shared memory, otherwise you would get at least a race-condition (as long as operations are atomic) or undefined behaviour. Generally we'd try to avoid both 😉
Hope this helps.
I have this solution but it's not nice at all because I'm can't manage the concurence beetwen the unserialize() & serialize() inside a thread and it also mutliply the memory used by the number of thread
<?php
$big_storage_elt = new \Parallel\Sync(serialize(array()));
$threads = array();
for($i=0;$i<5;$i++)
{
$thread = new \Parallel\Runtime();
$threads[] = $thread->run(function($index,$shared_array){
$datas = unserialize($shared_array->get());
$datas[$index] = 'value '.$index;
$shared_array->set(serialize($datas));
return true;
},array($i,$big_storage_elt));
}
foreach($threads as $thread) $thread->value();
print_r(unserialize($big_storage_elt->get()));
IDK the problem you are solving, but this works:
<?php
$big_storage_elt = [];
$threads = array();
for($i=0;$i<5;$i++)
{
$thread = new \Parallel\Runtime();
$threads[] = $thread->run(function($index){
return [$index, 'value '.$index];
},[$i]);
}
//waiting end of each thread
foreach($threads as $thread) {
list($k, $v) = $thread->value();
$big_storage_elt[$k] = $v;
}
print_r($big_storage_elt);
Perhaps you would also like to take a look at the code in this repo: https://github.com/realFlowControl/1brc
IDK the problem you are solving, but this works:
<?php $big_storage_elt = []; $threads = array(); for($i=0;$i<5;$i++) { $thread = new \Parallel\Runtime(); $threads[] = $thread->run(function($index){ return [$index, 'value '.$index]; },[$i]); } //waiting end of each thread foreach($threads as $thread) { list($k, $v) = $thread->value(); $big_storage_elt[$k] = $v; } print_r($big_storage_elt);
Yes but it's cost much more memory and delay using futur & channel when you deal with huge array.
It should be perfect if \Parallel\sync can have a function like this that could handle array as value also. public function set($key,$value); public function get($key);
something to study I guess
I get your point and I'd like to learn more about the issues. Like how big will the $big_storage_elt array grow (in size in bytes / number of entries / ...). I do get, that the data each thread returns in the 1brc code is not too much, even as the input file is 13 GB in size. Have you run a profiler to see how much the impact of copying the memory from one thread to another and combining intermediate results is? Like do you hit memory limits of your machine/container?