PHP-Daemon
PHP-Daemon copied to clipboard
Integrate Gearman (or similar) to provide plugin-like feature to distribute workers across multiple servers and not just multiple process on the same server.
This is a large feature that I'm currently thinking about. I'm trying to sketch-out what the feature will look like, and how best to integrate it into the existing codebase.
Goals:
- Integrate an off-the-shelf queuing solution (eg Gearman, Kestrel, Beanstalkd, ActiveMQ, etc) that could be used instead of the existing SysV message queue.
- Take advantage of the ability an OTS queue provides to distribute workers across multiple machines in a cluster.
- Implement this in a way that is idiomatic to PHP Simple Daemon and provides for easily moving between queueing providers.
Implementation Thoughts:
- Create a new directory
Core/Worker/Queues
- Move all of the SysV-specific code out of Core_Worker_Mediator into
Core_Worker_Queues_SysV
- Add an optional argument onto the
Core_Daemon::create_worker()
method to choose a specific queue. - Probably will need to create a unique interface for each queue, or possibly create a queue object in your daemon that you inject into the Worker (as the optional argument mentioned previously) or something else similar to deal with the unique needs of different queues. SysV needs a malloc for example. Gearman would need to know the ip/port of Gearmand. That sorta thing.
- I sorta like the dependency-injection style. That would work equally as well for workers implemented as closures as it does for
IWorker
objects. You would instantiate a Queue object before you create the worker, configure it however you want, and then pass it tocreate_worker()
- An emphasis will be placed on making it easy to move between different Queue types, but that may not apply to code written on v2.0 workers. It may be possible, but i'm not going to worry about that. You may have a small price if you want to upgrade to 2.1-style queues. If the penalty is too high, I'd consider creating this feature in a 3.0 branch and still supporting bug-fixes on the 2.x branch.
Any comment/input is appreciated. Is this a feature that interests you? Would you be interested in helping on any part of it? In beta testing?
I'd expect this feature to be on a Jan/Feb 2013 timeline.
Interesting feature to have. Is this related to the memcache branch?
Thanks for your work, using PHP Daemon is a pleasure.
Thanks for the kind words -- this has obviously been a labor of love for me. PHP needs more great libraries/packages and that's my goal for this.
Anyway, about the memcache branch -- yes and no.
So the inter-process comm in the Worker API has 2 parts: message passing between processes using a SysV message queue, for coordination between processes, and a generic shared memory store that we use to share args and return values back and forth. You can kind of think of the message queue as the "stack" and the Shared Memory (SHM) as the "heap."
Well, currently, the SHM "heap" is definitely the weakest link and if an application has stability issues with the worker API (eg it keeps crashing) then SHM is probably to blame. There are several discrete issues and for smaller memory allocations it's very stable but for bigger uses (passing or returning LARGE arrays or objects) it's not as stable as I'd want it to be.
So the work in the Memcache branch is my initial work from a couple months ago at abstracting the storage medium used for the "heap". And once it's abstracted we can create implementations that match the use case of a specific application. Memcache is the most obvious since we already have a memcache wrapper library in the application (in the first version a memcache key was the only "lock" provider available).
But in that concept, we'd still use kernel SysV message queues. But if we could abstract the entire IPC stack and drop-in an Off The Shelf queue solution that could be pretty great. I mean, it's a TON more complex and wouldn't be right for most applications. But if you really need to know that the jobs you send to workers won't get dropped even if it crashes then an OTS queue like Gearman or Beanstalkd or something, that persists the queue to disk in some way, could be the right tool for the job.
Anyway, sorry for the long winded answer.
And really -- thank you for the appreciation. Truly appreciated.
Shane
+1
Alright, I started development on this yesterday in the feature_abstract_ipc branch for now.
I created a new interface, Core_IWorkerVia
. And a new class Core_Worker_Via_SysV
that implements it. I'm moving over any SHM & MQ code from the Mediator class over to this new class.
The basic breakout is that Via classes (You can imagine a Core_Worker_Via_Gearman
next) will abstract the communication medium but know nothing of the content, and the Mediator class will of course know nothing about the communication medium.
So far the branch is not runnable. When I have a working prototype (super pre alpha even) I'll let people know. This is going to take a while.
such a good news, thank you. Ready to help for testing. I will also be interested in looking into a rabbitmq class :-) best.
Good news everyone!
Ok, the branch feature_abstract_ipc at f2681f7d3ab6 should now be functionally equal to master.
The first step was to remove all the IPC code from the Mediator class. At the same time, I changed the call struct that held serialized call data from a basic stdClass to a normal class. It's made the Mediator object much more simple.
The next step -- and one which will probably take more time -- is creating the next Via class. But while this happens, if you're testing anything, feel free to grab that branch and test using it. I'll keep that branch as stable as possible.
Edit:
- Only the PrimeNumbers example has been updated to use the new Via feature. The other examples may or may not work. Haven't tested them yet.
- The Debug classes haven't been updated yet. So the Worker debug shell is currently unavailable.
Edit Again: The Debug features are now integrated back into this branch. An abstract Core_Lib_DebugShell class has been created for that, getting Debug code out of the class hierarchy.
A lot more progress. At 15a42ee feature_abstract_ipc
is nearly through refactoring and testing. In addition to what I mentioned before, I also cleaned up process management, which was scattered across Core_Daemon
and Core_Worker_Mediator
. Now all process forking and reaping lives in a ProcessManager
class.
Now I can do what I planned 2 weeks ago -- write another Via class. Probably using a simple message queue of some kind. Maybe Beanstalkd.
sweet :-)
I vote for Beanstalkd support since we already use it in out infrastructure.