PHP-DI Add functionality to reset container entries

The purpose of this change is to improve the support for resource management and introspection of the Container class by adding the following methods:

Container::reset - unsets an entry in the container
Container::initialized - checks if an entry is initialized

These method names were chosen for consistency with the same features in Symfony's Dependency Injection component.

The idea here is that if you have an entry in the container that is no longer needed (say, for example, a database connection), then you can remove the entry from the container with reset to free the resource. If it is needed again later, no extra work is needed to get it from the container according to its definition, and a new instance will be constructed.

The initialized method is related in the sense that there is no existing way to see if an entry is present in the container. It is distinct from the has method, which only reports if an entry can be supplied by the container, not if it has. You may have some logic that depends on if a resource has been allocated yet (i.e. before the first get or after reset).

Two practical use cases:

A long-running API service has a database connection in its container. After a call to the API completes, the database connection is closed to free up resources and ensure that the connection does not time out. Because the database connection in the container is now invalid, careful measures need to be put in place to ensure that a reconnection to the database is performed before the object in the container is used on the next API call. This essentially removes the "on demand" aspect of the database connection, because the call may not even need a database connection. By using Container::reset, no special reconnection scaffolding is needed; the container will automatically create a new database connection regardless of where or when the next Container::get call is made.
A logging mechanism for the same API service will log certain properties of the database connection after each call. If no database connection has been opened, then no connection should be made to avoid unnecessary resource consumption. Without the introspection provided by Container::initialized, there is no way to determine if a connection has already been opened through the container interface.

May 24 '22 04:05 hemberger

@mnapoli When you have a moment to spare, I'd be really grateful to hear your thoughts about this PR. Is it fundamentally a non-starter for you, or are there aspects of it that you'd like to see improved or changed? I think this exposes some very powerful and practically useful functionality at almost no expense. I'm not picky about the details though! :)

Jul 29 '22 08:07 hemberger

Hey, sorry for the delay. I've been on the fence about this, and I'm still on the fence about it. Adding a root core API is not something I want to do lightly, and I'm not sure about the use case (and whether it's the best way to tackle that). Seeing as it's the first time it's been suggested, I'd rather wait and see if there are more people interested in this and why.

Aug 10 '22 14:08 mnapoli

Thanks for the response, and I certainly appreciate your need to carefully weigh whether or not to add to the API! While I can't speak for anyone else, what I can point out is that these features have been deemed sufficiently useful to be included in Symfony DI (introspection was added in symfony/symfony@b7b26af9c5acd52f0bf390ea9e559948b27703fa and resetability in symfony/symfony@445774592920ecd9a1e752f3c447507f11193805).

At the end of the day, the purpose of these two methods is to give the user more control over how their resources are managed. Some users may not find any need for this, but for others it will be indispensable. Those who need it may have already looked elsewhere for DI solutions (I like PHP-DI, and hope to continue being able to use it!).

Would it be worthwhile for me to split off the introspection method (i.e. initialized) into a separate PR? This is the highest priority for me, as its absence is a blocking issue, and I think its utility is much more straightforward.

I should also note that I've tried to extend Container to my own ResettableContainer class. However, for supporting optional compilation, I run into issues due to PHP not allowing multiple inheritance (a ResettableCompiledContainer class needs to extend both ResettableContainer and CompiledContainer, and as such there is no single parent class that can fulfill that role). There are workarounds using traits and union types, but they are exceedingly complex compared to the two one-line methods proposed here. One thing that would help simplify extending the Container interface, if that's the preferred solution, is if the abstract CompiledContainer class was a Trait instead of a class.

I'm also happy to provide more concrete code examples of any of these issues if it would be helpful in assessing this (or similar) PRs. Thanks again for your time!

Aug 22 '22 00:08 hemberger

Thanks for the details. I had a quick chat with the author of Framework X (React-based framework that works with a keep-alive process). I confirmed with them that what actually makes more sense in these scenarios is to have "resettable services", instead of having the container deal with that concern.

I.e. for example a "MysqlConnection" class that would either expose methods to disconnect/reconnect, or even automatically reconnect itself in case the connection is lost. Then it's up to the application to fetch the services that need to be reset (it's only a small share of all services anyway) and reset them by calling the appropriate API on the service.

Hopefully that makes sense?

Would it be worthwhile for me to split off the introspection method (i.e. initialized) into a separate PR? This is the highest priority for me, as its absence is a blocking issue, and I think its utility is much more straightforward.

Would it be the same use case for this one too?

Aug 22 '22 15:08 mnapoli

This would turn the container object more into a registry, and in order to be useful, all the other code needs to have access to the container instance in quite a lot more places. I would consider that to be some sort of anti-pattern.

Aug 23 '22 10:08 SvenRtbg

Thanks all for your feedback on this PR. I suppose I'm coming at this from the perspective of a legacy project (20+ years old) that I've been laboriously modernizing for the past 5 years. It does not have any abstract infrastructure for services, so it's possible that I'm trying to compensate for that by misusing the DI container. I appreciate the distinction between a DI container and a registry, though that distinction is perhaps a bit academic if it only takes 2 one-line methods to turn one into the other! :)

@mnapoli Please feel free to close this PR at your discretion, as you've given this more than a fair amount of consideration. But if you (or some other kind soul) would like to help deconstruct some of my lingering beliefs that these methods still provide value, and perhaps educate me on how to do things better or point me to some good resources, that would be much appreciated!

Thanks in advance to anyone who takes the time to help me out here. Read on for details...

What I'm struggling to grasp is how to avoid the overhead of checking if a service is ready to use while retaining lazy initialization in the context of resettable service providers. Taking the database connection as an example, with a resettable container, I know that every time I inject a database connection, it will be connected, regardless of whether or not it was disconnected after the last use (by virtue of having been unset in the container if it was disconnected previously). This is different from the case of a keep-alive service, because I'm not trying to keep it connected; rather, I'm trying to deallocate the resource until it is needed again (which could be in 1 second, or 1 minute, or 1 week!). This is natural for the DI container to do, because it knows how to create resources that are not initialized in the container.

This brings up a related point: there is no mechanism to reset an entry in the container to its uninitialized state. In my mind, this is similar to the limitation of static class properties, since PHP does not allow them to be unset once they are set. The best you can do is set them to null, but that solution is especially unhelpful with the DI container, since you don't want to be injecting null where you expect an object.

The case of introspection is a bit different. While it's a helpful partner of the proposed reset method, it can stand on its own as well. My primary use case is in logging errors, where I want to dump the properties of some object that the DI container is responsible for creating, but only if it has already been created. For example, if connecting to the database is throwing an error, then I don't want to try to log the error in the database. But if the DI container is what creates the connection object, then how can I tell if it's been created without introspection? I can't get() it, because that would try to create it.

Hopefully an expert can provide some illumination for me. :vulcan_salute:

Aug 26 '22 05:08 hemberger

I have only seen you mentioning the database connection as an example. Maybe there are no other useful examples for the container to warrant implementing reset(). I'll try and illustrate what my strategy would be, dealing with that database connection.

Your current strategy seems to be lazy initialization, however it seems like you don't know how to proceed after everything was initialized, like how to roll back to that uninitialized state.

I once had a PHP script that was running LONG time. Turned out that if the database hadn't been used for too long, the connection timed out without the script noticing, and the next query failed. How to detect the connection is still alive? mysqli_ping() So I was pinging the db before every query, and if the connection was down, I would connect again - this essentially is lazy initialization, but is happening more than once, but also on the first query attempt. If you are using some different DB access layer, many of them also have a ping, and PDO does not, but if you run a query like "SELECT 1" and it fails, catch the PDOException, assume the connection is down and reconnect.

If you explicitly know when to disconnect from the DB, then the DB object can keep track of that info instead of guessing/attempting, so the initial state is "unconnected", and query first connects if unconnected and if successful sets the state to "connected", and there is a disconnect methods that tears down the connection and resets the state to unconnected. This entirely happens inside the database object, without any need from a container to recreate the whole object and resetting the one-time initialization logic that is common for most of the PHP scripts that don't have/want to deal with resource management because memory is always available enough.

What about using the container then? It can still provide your code with the database object, but because the specific DB logic for connection management is contained inside that object, you don't have to mess with the container resetting things. Also consider that the container would only ever hand out one single instance of the database object, so any connection is shared between multiple consumer objects, and any state change also is shared, meaning that if object A decides it is time to disconnect, then object B ends up with the same DB object that had just shut down it's connection. You would have to reset the container not only for the DB object, but for objects A and B to get them recreated, because they would keep their old database object otherwise. If you encapsulate connection management inside the DB object, object B would just issue a query, and the database connection will get recreated automatically.

On the other hand, if there really only is one DB object in your code, the connection itself is probably not your concern. You'd have to pay attention to database resources that are created by queries. These do contain some data, however depending on the database layer, the data itself might stay within the actual database and will be streamed to the PHP client when consumed. Usually the result of a query is immediately consumed and converted into an array or object list by looping over the result. This is consuming PHP memory, and has to be carefully managed if this is an issue. Using generators instead of establishing in-memory data for loops might be an alternative, so the memory is consumed somewhere else, but not within PHP.

That's my generic approach. You see there are alternatives available, but I might still completely miss your problem. Appreciate your feedback, though. Apologies to @mnapoli for capturing the issue tracker for out-of-scope discussions.

Aug 26 '22 08:08 SvenRtbg

Thanks for your detailed reply @SvenRtbg! If it is inappropriate to continue the discussion here, please let me know (though I am finding it immensely helpful).

I once had a PHP script that was running LONG time. Turned out that if the database hadn't been used for too long, the connection timed out without the script noticing, and the next query failed.

Yep, this is exactly my experience as well. I originally went the mysqli_ping route, but then I wondered why I was going through the effort to keep the connection alive when it was unused 99.9% of the time. This is when I started thinking about explicit disconnect/reconnect and, later, lazy deallocate/reconnect.

For a little extra background, the primary purpose of my code is to handle web requests. So it is appropriate to have a single database connection for the lifetime of the process (usually just a fraction of a second). A reset method is not needed in this context (though introspection still helps with error handling, as I've mentioned earlier). Then I have a handful of daemonized processes that listen for events on various websockets (i.e. processing a command run in Discord). While the PHP process in this case is eternal, due to the infrequency of these events, the database connection should be opened only if injected somewhere during event processing, and then closed after the event is processed (and only if it was opened).

I think a few schematic code snippets may be helpful here. So looking at the daemonized PHP process, here are the various options that seem like they're available (and I may very well be missing or misrepresenting a design pattern that has been suggested!):

Make the Database class responsible for handling disconnect/reconnect.

while (true) {
    if (has_event()) {
        process_event();
    }
    // Problem 1: How do you lazily disconnect (to deallocate the unneeded resource)
    //            if the event didn't open a database connection?
}

class Database {
    function query(...) {
        $this->reconnect(); // Problem 2: Incurs unnecessary overhead on vast majority of queries
        $this->connection->query(...);
    }
}

Avoid overhead in the Database class (i.e. no reconnection logic) by making the daemon responsible for explicitly reconnecting before processing events.

while (true) {
    if (has_event()) {
        $database->reconnect(); // Problem: Connection won't be needed every time.
        process_event();
        $database->disconnect();
    }
}

Deallocate the Database instance in the DI container.

while (true) {
    if (has_event()) {
        process_event();
        $container->reset(Database::class); // Closes connection if used, otherwise does nothing
    }
}

I don't see any downsides with option (3): there's no overhead in the Database class (and it remains "stateless", which is an added benefit), connections are not opened by event processors that don't need it, and we close the connection after the event is processed. Performance and resource management are optimal, as far as I can tell.

My best guess is that experts who want to achieve the behavior exhibited by (3) have some additional layer on top of their database class (e.g. a service provider?), so that the DI container is not overloaded for this purpose (even though, as we can see, it's very easy for it to do this). Is this the design pattern that I'm missing, or is there something else that I'm not picking up on?

Thanks again, everyone, for your advice and your patience!

Aug 26 '22 22:08 hemberger

Excellent illustration of your though process (and as long as we are not getting complaints, I'll just continue responding to the discussion, I believe it might be helpful for passive readers as well).

You describe three different approaches. What are they doing on the lowest level we are discussing here, in terms of managing the db connection? Let's compare what is happening in detail for a couple of situations.

Situation 1: Started from scratch, there is no DB connection established. Situation 2: Already in use, there is an established DB connection. Situation 3: DB connection had been established before, but was explicitly disconnected afterwards. Situation 4: DB connection has timed out, but not explicitly shut down.

And we'll only be looking at one single loop run that might emit one or more DB queries, with any of these situations describing what happened before this.

Approach 1: The code explicitly does not know if there is a connection.

In situation 1, the ping will immediately fail without even communicating with the database, then the db connection is created, then the query is sent. A second query would ping the db with a minor communication overhead, then send the query.
In situation 2, the ping will send minor communication, then the query is sent.
In situation 3, it is like in situation 1.
In situation 4, it is like in situation 1. The managing code could also keep track of explicit shutdown of the connection and avoid attempting to ping, reconnecting directly instead. So in summary, this approach will send out a minor communication to the database before any query is sent. And in case a connection has to be established, it is only created if really necessary.

Approach 2: Explicit connect and disconnect.

In all situations, the code establishes a connection before the query is sent, then explicitly shuts down the connection.
Possibly more than one query could be sent in one connection step.
Still plenty of connect/disconnect calls happen.
It is quite impossible to end up in situations 2 or 4 because the connection is not staying open after the work step is finished.

Approach 3: Implicit connect and disconnect.

In all situations, the code establishes a connection before the query is sent, then implicitly shuts down the connection by destroying the database object inside the container.
Possibly more than one query could be sent in one connection step,
Still plenty of connect/disconnect calls happen.
It is again quite impossible to end up in situations 2 or 4.

So explicitly shutting down the database connection in my eyes is doing exactly the same as attempting to reset the object in the container. I'd assume that even without an explicit shutdown in the database destructor, the resource variable goes out of scope and will be cleaned up. Hoping that the connection remained and the next clean db connect call would just reuse this connection is not guaranteed.

I do admit there is a seemingly minor overhead for the first approach generating mysql pings before every query. I have tried to identify what is happening technically during this call, but ended up finding nothing specific in the PHP source code - it seems like they just forward the call to the C API function. The MySQL API documentation isn't really mentioning what is happening on the low level either.

But the same is true for me not exactly knowing what is happening during mysqli_connect(). I'm sure this is doing more, as it has to resolve the hostname into an IP, so there is at least a DNS lookup involved unless you connect to an IP directly. The TCP connection is using a 3-way-handshake to become established (unix domain sockets might be more lucky), which is adding to the latency a bit, then connection has to exchange credentials, before any query is allowed. So despite people claiming that when using MySQL establishing a connection is quick, and because we are PHP, we don't even have to use database connection pools, if we really want to compare technical effort, I would think that mysqli_ping() on an established connection has advantages over having to connect and disconnect plenty of times. The only argument for disconnecting really would be if the database is running out of available connections, so it would be beneficial for the whole application cluster to release unused DB connections as soon as possible.

I'd argue that unless there are very specific situations that require very special handling, the comparison between ping and connect approaches will not yield an obvious winner. If you want to counter this argument, please run performance tests on your setup to get a valid answer for your setup. :)

Moving on to another aspect: What about code clarity, readability and where stuff is supposed to happen?

In my eyes approach 1 with a db class that is able to manage the connection itself is quite easy to consume. It removes the concerns about when to clean up the db connection in the code, because there is no way to do that. Note that I explicitly do not talk about query results here - they might get quite big and are definitiely in scope of proper memory usage management. But the connection itself isn't using too much memory by itself, I'd think. Couple hundred bytes, maybe a kilobyte. So my argument here is: Let's worry about database results, don't worry about the single connection itself.

Still approach 2 might be a valid way to go, because it clearly communicates what happens: The connection is created and destroyed. It conveys clarity about what happens, making it obvious to the developer that if they want to query the database, they should connect before doing so. And still the class itself may track the status internally and just connect lazily if a query was not making a connection before, really leaving only a case for disconnecting.

I want to mention here that one single instance of a db class should explicitly not be used to connect to more than one single database. If you have more than one database. use one object per database connection, otherwise it'll become a mess quite easily.

Approach 3 does not communicate the intent at all. And something that is missing in this short illustrative demo code is that once you reset the container, how to you fetch a fresh instance from it? You have to have the container available somewhere deeper inside your code, maybe inside process_event(), maybe even deeper. This quickly would turn into the "service locator antipattern", where you pass the container basically into every object and let that object fetch whatever it needs from the container directly.

Still this has the problem how to detect that the container should be asked if a new object should be fetched. I assume somewhere in your business object, you'd do something like this:

class SomeEvent {
  private DbClass $dbclass;
  private Container $container;

  public function __construct(Container $container)
  {
    $this->container = $container;
  }

  public function do_some_event()
  {
    $this->fetchDbClass();
    // then do some query
  }
  private function fetchDbClass()
  {
    if (!$container->initialized(DbClass::class)) { // no existing instance available?
      $this->dbclass = $this->container->get(DbClass::class); // recreate one
    }
  }
}

So instead of explicitly dealing with creating a DB connection here, you are dealing with object creation, potentially hiding the intent.

Also consider that once you have created an instance of SomeEvent, and it got an instance of DbClass, and you were somehow not adding the "fetch fresh from container" code, resetting the container wouldn't do anything to the object itself, it would keep the old DB connection and also not disconnect from the database automatically. Or if you added some code that will disconnect in that DbClass, then SomeEvent will unexpectedly query the DB without a connection, and fail.

Note that probably you are fetching SomeEvent classes from the container, and you probably have to reset these classes as well in order to fetch a new instance of them, otherwise the existing shadow instance inside the container would still host the old instance of DbClass.

In the end, there is no real difference between explicitly disconnecting and reconnecting the DB object, and resetting something in the container, expecting it to be disconnecting and reconnecting to the DB. It is technically the same thing on low level, but looks quite different on higher levels of coding.

Aug 27 '22 13:08 SvenRtbg

Hey everyone, yes feel free to continue discussing this topic 👍

Approach 2 sounds very good to me, I would even simplify it to:

while (true) {
    if (has_event()) {
        process_event();
        $database->disconnect();
    }
}

I would expect the Database class to automatically establish a connection on the first query (checks if already connected or not). But that's a detail.

In any case, approaches (2) and (3) and very very similar. It's just a different implementation, and approach (2) doesn't involve the container at all.

When I look at Symfony's resettable services, it's basically the same thing: https://symfony.com/doc/current/reference/dic_tags.html#kernel-reset

During the kernel.terminate event, Symfony looks for any service tagged with the kernel.reset tag to reinitialize their state. This is done by calling to the method whose name is configured in the method argument of the tag.

So Symfony's container would basically call $database->disconnect() for you, it wouldn't "clear" the service from the container. And that makes sense to me.

The way I would approach this would be:

do I need to reset 1 or 2 services -> I would hardcode that in my loop, life is too short to create generic solutions to specific problems
do I need to reset 10+ services -> I would create a container entry (e.g. loop_resettable_services) that is an array, containing a reference to all container entries I want to reset on each loop. I would create a ResettableService interface with a reset() method. That way my event loop would be able to fetch all services and call the reset() method on each one of them. (this is basically what Symfony container tags are)

Note: yes, that would mean instantiating all of these services, even if not needed. But since we are talking about a long-lived process (HTTP server or similar), it's not crazy to assume all these services might be instantiated at some point, so they will all be created in the lifetime of the server. So we're not wasting too much here. Also, that also assumes all these resettable services are lazy with their expensive operations (e.g. don't create the DB connection in the constructor, wait for the 1st query instead), but I consider that best practice anyway.

Aug 31 '22 08:08 mnapoli

Thanks to both @SvenRtbg and @mnapoli for your helpful (and thorough!) notes. While I think this may be an enlightening conversation for others to read, I don't want to clutter the PR list unnecessarily, so I will close this. Hopefully interested parties can still find it. Thanks again!

Jan 13 '23 20:01 hemberger