Support unique yet stable identifier generation in "Specification"-style designs
Feature Request
| Q | A |
|---|---|
| New Feature | yes |
| RFC | yes |
| BC Break | no |
Summary
I've seen quite a few applications of the "Specification" pattern. In essence, a Specification is a "business" or "domain logic" class that encapsulates a particular business condition, filter expression or the like. There are various interpretations and approaches of this pattern, but here, this or that might be good starting points if you don't know about it.
Among other things, a Specification needs to be able to "apply" itself to a Doctrine QueryBuilder. While doing so, it might need to add conditions and parameters to a query or perform joins. In both of these cases, the Specification needs to come up with unique names for parameters or join aliases. The challenge with these unique names is that the Specification itself cannot know which other Specifications are applied to the same QueryBuilder, and even several instances of the same Specification class may be applied at once.
In some cases I've seen, Specifications even need to make these aliases publicly available and have to do so even before they got hold of the QueryBuilder that they'll be applied to. (For example, a Specification might express "eager load a particular entity", and another Specification might build on top of it and need the alias.) This defeats an approach where you could use a central \SplObjectStorage to keep track of alias names per QueryBuilder instance, because you don't even have that yet.
One likely solution is to start using uniqid() for the parameter and alias names. But this has a serious drawback: It defeats the Query Cache, since repeated requests for even the very same page will result in different DQL every time. Depending on cache implementation, your cache will just become evicted again and again (APCu behavior, to my knowledge) or gradually fill up disk space and ultimately use up all available OPcache memory for cached, but ever never reused PHP files (e. g. default Symfony Cache pools).
Another solution would be to use something like static variables to generate identifiers based on a counter that re-starts from 0 upon every request. That way, chances are that "the very same request" will generate identical queries again; but of course, conditional code paths that execute additional Specifications - even in completely unrelated queries - break things again.
So, what I would like to discuss here is if and how we could support this from the ORM side.
One approach I had in mind was if I were able to run a "filter" somewhere around Query::_parse(): If at least all uniqid()-based identifiers followed a particular naming convention, those identifiers could be replaced with more stable names that make caching possible again. Yes, this could only be text based, since we want to avoid parsing the DQL in the first place, and so it has the risk of wrong matches; and yes, it would also have to keep a substitution map around to rename parameters, in case the same query is executed multiple times.
This is just a first idea. If you have other ideas, or completely different approaches to the problem, please share them.
Why not a number to increment? "p1", "p2", "p3", ... thats the way we do it for column aliases in the persisters for example.
Who could issue/give out these numbers? Not the individual Specifications (requires coordination between all Specifications for a particular query, possibly even before the common QueryBuilder is created).
Query::useQueryCache(false) could be used to turn off the Query Cache. Performance-wise it does not make a difference when uniqid()-based identifiers/names are used, but at least we'd not fill up the cache with junk.
Problem: Specifications operate on the QueryBuilder, and there is currently no way to set this for the Query through the QueryBuilder.
Maybe a query hint, possibly even a default one set on the EntityManager, could be used to trigger something like a generic "DQL pre-processor" before the Parser is activated; or even control the actual Parser class used?
Care must be taken when re-naming things in the DQL because parameter names also need to be taken care of, but parameter processing is independent of the Parser, but at least happens based on the ParserResult.
Here's a whacky helper class that... well... solves it? The main con argument is that the replacement of temporary, uniqid() based identifiers happens with a naïve string replacement. This is because we cannot afford real DQL parsing at this stage.
Usage:
Whenever you want to join something or add a parameter to the QueryBuilder, call SpecificationIdProvider::createAlias() to create a unique identifier.
When you're finished building the DQL query (or done with the QueryBuilder), call SpecificationIdProvider::cleanup($yourQueryOrQueryBuilder) to obtain a "clean" query where all identifiers will be stable across requests.
<?php
use Doctrine\ORM\Query;
use Doctrine\ORM\QueryBuilder;
class SpecificationIdProvider
{
public static function createAlias()
{
return uniqid('_unique_alias_');
}
public static function cleanup($query): Query
{
if ($query instanceof QueryBuilder) {
$query = $query->getQuery();
}
$dql = $query->getDQL();
preg_match_all('/\b_unique_alias_[a-f0-9]{13}\b/', $dql, $matches);
$map = [];
foreach ($matches[0] as $key => $alias) {
$map[$alias] = 'alias_' . $key;
}
$cloneQuery = clone $query; // resets hints and parameters, @see \Doctrine\ORM\AbstractQuery::__clone()
$cloneQuery->setDQL(str_replace(array_keys($map), $map, $dql));
foreach ($query->getHints() as $name => $value) {
$cloneQuery->setHint($name, $value);
}
foreach ($query->getParameters() as $parameter) {
$name = $parameter->getName();
$newName = isset($map[$name]) ? $map[$name] : $name;
$cloneQuery->setParameter($newName, $parameter->getValue(), $parameter->getType());
}
return $cloneQuery;
}
}
Just came across the fact that one can add Criteria to a QueryBuilder instance through QueryBuilder::addCriteria().
The values to compare against are placed in parameters, and the parameter names are derived based on the alias and field name being referred to by the expression:
https://github.com/doctrine/orm/blob/4baa7bd25218f363dc18286a9e1d32e2148b7fee/src/Query/QueryExpressionVisitor.php#L97-L104
Not sure if that would help in this case here, but might be worth a more detailed look.