ext-solr
ext-solr copied to clipboard
Do not differenciate sites by "domain" but use TYPO3 sites for this
Currently EXT:solr considers a "site" being the "same domain". This comes from the legacy of using sys_domain records. In TYPO3 since v9 we have a different concept of "site", being having a common "root PID". The domain is just part of the "Base URL Prefix" when generating URLs, so for example these could be different sites:
- site=
mainpid=1000 baseURL=https://www.example.com/ - site=
site-2pid=2000 baseURL=https://www.example.com/site-2/ - site=
sidepid=3000 baseURL=https://www.example.net/ - site=
site-4pid=4000 baseURL=https://www.example.net/site-4/
The Solr Extension would consider main and site-2 to be the same site, because it only looks on the domain to calculate the siteHash, which is the major factor upon deciding which results to display to the user in the frontend (thus it will mix search results for these two sites.
So this is basically a bug report, but also a feature request, because if we change this concept, probably some things (or sites relying on this "misbehaviour") might break - but on the long run this would be helpful so that we know what we are dealing when talking about a "Site".
Maybe I have also just overseen something, if so, please correct me if I am wrong!
That would also fix some use cases with baseVariants. See #2846 #2578
Reading through these other issues:
Maybe a generic solution to cope all "wishes" would be to have it configurable which information is used to generate the "site hash":
- for some only the root PID is sufficient
- for some only the domain name is sufficient (current default)
- for some a combination of domain name and PID
- for some a combination of base URL and PID
- for others the combination of PID and TYPO3_CONTEXT
- etc
You only have to decide which is the "default" one and document what it means when this is changed.
@baschny @avogt1701 @christophlehmann Is maybe a new PSR-14 Event a best choice instead of programming all the strategies/variants with settings?
Yes i think that can be a suitable solution. Maybe something like this:
SiteHashService Introduce new method SiteHashService->getSiteHashDomain with new PSR-14 event. Someting like this:
public function getSiteHashDomain(\TYPO3\CMS\Core\Site\Entity\Site $typo3Site): string
{
// todo: Add an event here that can be used to manipulate the "domain" resolution
// current default
return $typo3Site->getBase()->getHost();
}
Replace the "domain" resolution with the new method
SiteRepository->buildTypo3ManagedSite https://github.com/TYPO3-Solr/ext-solr/blob/12.0.0/Classes/Domain/Site/SiteRepository.php#L227
$siteHashService = GeneralUtility::makeInstance(SiteHashService::class);
$domain = $siteHashService->getSiteHashDomain($typo3Site);
SiteHashService->getDomainByPageIdAndReplaceMarkers https://github.com/TYPO3-Solr/ext-solr/blob/12.0.0/Classes/Domain/Site/SiteHashService.php#L106
$domainOfPage = $this->getSiteHashDomain($typo3Site);
SiteHashService->getDomainListOfAllSites https://github.com/TYPO3-Solr/ext-solr/blob/12.0.0/Classes/Domain/Site/SiteHashService.php#L91
$domains[] = $this->getSiteHashDomain($typo3Site);
We also need to care about the last eiD-Script. This uses the domain and the eiD Middleware is before Site resolver Middleware, so we have no site at this point. Thus the Script needs to be turned into a Middleware behind Site resolver.
A core-friendly site hash strategy as upcoming default would be nice. From my point of view it could be Site::$identifier + TYPO3_CONTEXT.
@baschny What has been your solution up until today?
My take on a solution with what we already have present: https://gist.github.com/sorenmalling/15f2e4ba7f9c9bff19592da3f060443c
@sorenmalling one potential solution is to index the site identifier too:
plugin.tx_solr.index.queue.pages.fields {
...
siteIdentifier_stringS = TEXT
siteIdentifier_stringS.data = site:identifier
And use it in the filter:
plugin.tx_solr {
search {
query {
filter {
# restrict search results to the current site
currentSite = TEXT
currentSite.data = site:identifier
currentSite.wrap = siteIdentifier_stringS:"|"
}
}