event-loop icon indicating copy to clipboard operation
event-loop copied to clipboard

How to aggregate data processed in parallel with pcntl

Open fiasco opened this issue 3 years ago • 3 comments

I want to complete independent processing in parallel to speed up the time it takes to complete an operation. I have a CLI (Symfony Console) tool that initiates a collection of independent processing tasks. I currently use PCNTL to fork each process but then have the challenge of returning the processed data back to the parent process.

I built a library with the help of phpreact. This library forks a socket server which the child forks can send their data too and the parent thread can poll it and retrieve the processed data when complete.

This feels more like a workaround than a solution and I'm wondering if there is a simpler way to do this? From what I can see, reactphp is only asynchronous in that it can continue php scripts while awaiting the response from a stream but can't actually run php itself asynchronously like pcntl does?

One thought is that I rearchitect my CLI tool so that reactphp calls CLI sub-commands via processes and in doing so creates streams I can wait for asynchronously. But I was hoping to utilise the forked memory state of the parent to reuse pre-processed data that all children use.

Anyway, appreciate your thoughts.

fiasco avatar Aug 14 '22 19:08 fiasco

Hey @fiasco, I think reactphp/child-process is what you're looking for.

Forking can be a possibility here but there are many things you need to be careful about. For example, when forking you create an exact copy of your process, but this doesn't mean that they share the same resources. This could potentially lead to race conditions or even unwanted behavior, all depends on what you're trying to do.

For now the child-process component should fit for your use case, you can also take a look at clue/reactphp-pq.

I hope this helps :+1:

SimonFrings avatar Aug 16 '22 11:08 SimonFrings

Thanks for the help. Child-process looks like it runs things in a seperate thread which I'm getting the impression is just a safer, and therefore defacto, way to conduct processing in parallel. Its a shame we can't do this natively in PHP like truely asynchronous languages can (e.g. Javascript).

https://github.com/clue/reactphp-pq looks interesting though its just example code at the moment - doesn't appear to be any actual code that does this job. I suspect @clue would run into the same issue as me: Its easy enough to return simple data types (bool, int, string, float, array) but its difficult to return PHP objects unless there classes have serialize and unserialize magic methods which means you can't share objects easily between threads.

fiasco avatar Aug 18 '22 16:08 fiasco

Thanks for the help. Child-process looks like it runs things in a seperate thread which I'm getting the impression is just a safer, and therefore defacto, way to conduct processing in parallel. Its a shame we can't do this natively in PHP like truely asynchronous languages can (e.g. Javascript).

Yes, the Child Process component lets you spawn a new process you can use to defer blocking CPU intensive operations to. Last time I checked, and I can be wrong here, but JavaScript isn't a multithreaded language. NodeJS does do multi threading by offloading the blocking parts. Found this nice read up about it, for those interested: https://www.digitalocean.com/community/tutorials/how-to-use-multithreading-in-node-js

https://github.com/clue/reactphp-pq looks interesting though its just example code at the moment - doesn't appear to be any actual code that does this job. I suspect @clue would run into the same issue as me: Its easy enough to return simple data types (bool, int, string, float, array) but its difficult to return PHP objects unless there classes have serialize and unserialize magic methods which means you can't share objects easily between threads.

Over the years I've spent a significant amount of time on this. Used both processes and threads for this in PHP. Wrote an overview down in https://blog.wyrihaximus.net/2022/07/my-road-to-fibers-with-reactphp/. And to be honest child processes work, but they are tricky in non-Linux environments, plus they aren't. Wrote my fair share of packages for it, including one that lets you run a givinig callable. https://github.com/WyriHaximus?tab=repositories&q=child-process&type=&language=&sort= However, child processes are heavy, and you need to serialise whatever you're sending between processes. Or you can uses threads, used ext-parallel for that, and then you can toss objects between threads directly. However the bigger the object, the slower it becomes. Set up an entire org to do this with ReactPHP with a ton of different tooling doing a whole bunch of things with it https://github.com/reactphp-parallel/. And I still think it has a place, however, it isn't getting updated since PHP 8.1 got fibers.

TL;DR We provide the tools to make this work. And for all our components, including the filesystem we make sure it works non-blocking, which means that some of the extensions we suggest might use threads under the hood for you just like NodeJS does.

WyriHaximus avatar Aug 22 '22 19:08 WyriHaximus

I believe this has been answered, so I'm closing this for now. Please come back with more details if this problem persists and we can always reopen this :+1:

clue avatar Jan 05 '23 12:01 clue