ext-solr icon indicating copy to clipboard operation
ext-solr copied to clipboard

Do not differenciate sites by "domain" but use TYPO3 sites for this

Open baschny opened this issue 2 years ago • 8 comments

Currently EXT:solr considers a "site" being the "same domain". This comes from the legacy of using sys_domain records. In TYPO3 since v9 we have a different concept of "site", being having a common "root PID". The domain is just part of the "Base URL Prefix" when generating URLs, so for example these could be different sites:

  • site=main pid=1000 baseURL=https://www.example.com/
  • site=site-2 pid=2000 baseURL=https://www.example.com/site-2/
  • site=side pid=3000 baseURL=https://www.example.net/
  • site=site-4 pid=4000 baseURL=https://www.example.net/site-4/

The Solr Extension would consider main and site-2 to be the same site, because it only looks on the domain to calculate the siteHash, which is the major factor upon deciding which results to display to the user in the frontend (thus it will mix search results for these two sites.

So this is basically a bug report, but also a feature request, because if we change this concept, probably some things (or sites relying on this "misbehaviour") might break - but on the long run this would be helpful so that we know what we are dealing when talking about a "Site".

Maybe I have also just overseen something, if so, please correct me if I am wrong!

baschny avatar Jan 19 '23 13:01 baschny

That would also fix some use cases with baseVariants. See #2846 #2578

avogt1701 avatar Jan 24 '23 14:01 avogt1701

Reading through these other issues:

Maybe a generic solution to cope all "wishes" would be to have it configurable which information is used to generate the "site hash":

  • for some only the root PID is sufficient
  • for some only the domain name is sufficient (current default)
  • for some a combination of domain name and PID
  • for some a combination of base URL and PID
  • for others the combination of PID and TYPO3_CONTEXT
  • etc

You only have to decide which is the "default" one and document what it means when this is changed.

baschny avatar Jan 24 '23 14:01 baschny

@baschny @avogt1701 @christophlehmann Is maybe a new PSR-14 Event a best choice instead of programming all the strategies/variants with settings?

dkd-kaehm avatar Oct 23 '23 14:10 dkd-kaehm

Yes i think that can be a suitable solution. Maybe something like this:


SiteHashService Introduce new method SiteHashService->getSiteHashDomain with new PSR-14 event. Someting like this:

public function getSiteHashDomain(\TYPO3\CMS\Core\Site\Entity\Site $typo3Site): string
{
    // todo: Add an event here that can be used to manipulate the "domain" resolution

    // current default
    return $typo3Site->getBase()->getHost();
}

Replace the "domain" resolution with the new method

SiteRepository->buildTypo3ManagedSite https://github.com/TYPO3-Solr/ext-solr/blob/12.0.0/Classes/Domain/Site/SiteRepository.php#L227

$siteHashService = GeneralUtility::makeInstance(SiteHashService::class);
$domain = $siteHashService->getSiteHashDomain($typo3Site);

SiteHashService->getDomainByPageIdAndReplaceMarkers https://github.com/TYPO3-Solr/ext-solr/blob/12.0.0/Classes/Domain/Site/SiteHashService.php#L106

$domainOfPage = $this->getSiteHashDomain($typo3Site);

SiteHashService->getDomainListOfAllSites https://github.com/TYPO3-Solr/ext-solr/blob/12.0.0/Classes/Domain/Site/SiteHashService.php#L91

$domains[] = $this->getSiteHashDomain($typo3Site);

avogt1701 avatar Oct 23 '23 15:10 avogt1701

We also need to care about the last eiD-Script. This uses the domain and the eiD Middleware is before Site resolver Middleware, so we have no site at this point. Thus the Script needs to be turned into a Middleware behind Site resolver.

A core-friendly site hash strategy as upcoming default would be nice. From my point of view it could be Site::$identifier + TYPO3_CONTEXT.

christophlehmann avatar Oct 23 '23 15:10 christophlehmann

@baschny What has been your solution up until today?

sorenmalling avatar Feb 20 '24 10:02 sorenmalling

My take on a solution with what we already have present: https://gist.github.com/sorenmalling/15f2e4ba7f9c9bff19592da3f060443c

sorenmalling avatar Feb 20 '24 12:02 sorenmalling

@sorenmalling one potential solution is to index the site identifier too:

plugin.tx_solr.index.queue.pages.fields {
...
    siteIdentifier_stringS = TEXT
    siteIdentifier_stringS.data = site:identifier

And use it in the filter:

plugin.tx_solr {
    search {
        query {
            filter {
                # restrict search results to the current site
                currentSite = TEXT
                currentSite.data = site:identifier
                currentSite.wrap = siteIdentifier_stringS:"|"
            }
        }

baschny avatar Feb 23 '24 10:02 baschny