wp-rocket icon indicating copy to clipboard operation
wp-rocket copied to clipboard

3.17 - Frontend process

Open Miraeld opened this issue 1 year ago • 5 comments

Description

Implement a new Frontend Controller for WP Rocket to handle the lazy-render-content feature. This controller will manage the lazy-rendering of content, leveraging the design and file structure similar to the existing LCP (Largest Contentful Paint) controller, but tailored for lazy-rendering content. We will implement three solutions (DOM, Regex, SimpleHtmlDom) to compare their results.

Scope a solution

  1. Define the Frontend Controller:

    • Create a new controller in the WP_Rocket\Engine\Optimization\LazyRenderContent\Frontend namespace.
    • This controller will include methods to apply lazy-rendering optimizations to the content.
  2. Implement the Controller Class:

  • First we create the compulsory method by the controller interface optimize, we can keep it empty for as we will attend to it here later

    • Create a method maybe create_hash that modifies the buffer and adds hashes to eligible elements.
    • Add a conditional check, if LRC data does not exist in DB, then add the hashes
    • Utilize the methods from the prototype PR for the new frontend controller:
    • Controller: Handles HTML processing using three methods: DOMDocument, Regex, and SimpleHtmlDom.
    <?php
    declare(strict_types=1);
    
    namespace WP_Rocket\Engine\Optimization\LazyRenderContent\Frontend;
    
    use DOMDocument;
    use voku\helper\HtmlDomParser;
    use voku\helper\SimpleHtmlDomBlank;
    
    class Controller {
        public function add_locations_hash_to_html_dom( $html ) {
            $dom = new DOMDocument();
            @$dom->loadHTML( $html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD );
    
            $body = $dom->getElementsByTagName( 'body' )->item( 0 );
    
            if ( ! $body ) {
                return $html;
            }
    
            $this->add_hash_to_element_dom( $body, 2 );
            return $dom->saveHTML();
        }
    
        private function add_hash_to_element_dom( $element, $depth ) {
            if ( $depth < 0 ) {
                return;
            }
    
            $skip_tags = [
                'DIV',
                'MAIN',
                'FOOTER',
                'SECTION',
                'ARTICLE',
                'HEADER',
            ];
    
            static $count = 0;
    
            foreach ( $element->childNodes as $child ) {
                if (
                    XML_ELEMENT_NODE !== $child->nodeType
                    ||
                    ! in_array( strtoupper( $child->tagName ), $skip_tags, true )
                ) {
                    continue;
                }
    
                $child_html = $child->ownerDocument->saveHTML( $child );
                $opening_tag_html = strstr( $child_html, '>', true ) . '>';
    
                $hash = md5( $opening_tag_html . $count );
    
                ++$count;
    
                $child->setAttribute( 'data-rocket-location-hash', $hash );
    
                $this->add_hash_to_element_dom( $child, $depth - 1 );
            }
        }
    
        public function add_locations_hash_to_html_regex( $html ) {
            $result = preg_match( '/(?><body[^>]*>)(?>.*?<\/body>)/is', $html, $matches );
    
            if ( ! $result ) {
                return $html;
            }
    
            return $this->add_hash_to_element_regex( $html, $matches[0], 2 );
        }
    
        private function add_hash_to_element_regex( $html, $element, $depth ) {
            if ( $depth < 0 ) {
                return $html;
            }
    
            $skip_tags = [
                'div',
                'main',
                'footer',
                'section',
                'article',
                'header',
            ];
    
            $result = preg_match_all( '/(?><(' . implode( '|', $skip_tags ) . ')[^>]*>)/is', $element, $matches, PREG_SET_ORDER );
    
            if ( ! $result ) {
                return $html;
            }
    
            $count = 0;
    
            foreach ( $matches as $child ) {
                $opening_tag_html = strstr( $child[0], '>', true ) . '>';
    
                $hash = md5( $opening_tag_html . $count );
    
                ++$count;
    
                $replace = preg_replace( '/' . $child[1] . '/is', '$0 data-rocket-location-hash="' . $hash . '"', $child[0], 1 );
    
                $html = preg_replace( '/' . preg_quote( $child[0], '/' ) . '/', $replace, $html, 1 );
            }
    
            return $html;
        }
    
        public function add_locations_hash_to_html_simple_html_dom( $html ) {
            $dom = HtmlDomParser::str_get_html( $html );
    
            $body = $dom->getElementByTagName( 'body' );
    
            if ( $body instanceof SimpleHtmlDomBlank ) {
                return $html;
            }
    
            $this->add_hash_to_element_simple_html_dom( $body, 2 );
    
            return $dom->save();
        }
    
        private function add_hash_to_element_simple_html_dom( $element, $depth ) {
            if ( $depth < 0 ) {
                return;
            }
    
            $skip_tags = [
                'DIV',
                'MAIN',
                'FOOTER',
                'SECTION',
                'ARTICLE',
                'HEADER',
            ];
    
            static $count = 0;
    
            foreach ( $element->childNodes() as $child ) {
                if ( ! in_array( strtoupper( $child->getTag() ), $skip_tags, true ) ) {
                    continue;
                }
    
                $child_html = $child->html();
                $opening_tag_html = strstr( $child_html, '>', true ) . '>';
    
                $hash = md5( $opening_tag_html . $count );
    
                ++$count;
    
                $child->setAttribute( 'data-rocket-location-hash', $hash );
    
                $this->add_hash_to_element_simple_html_dom( $child, $depth - 1 );
            }
        }
    }
    
    • Create WP_Rocket\Engine\Optimization\LazyRenderContent\Frontend\Subscriber
    • Add the rocket_buffer event with callback to create_hash method in the controller, we can set priority to 16/17
  1. ServiceProvider and Subscriber Integration:

    • Update the ServiceProvider to register the new lazy-render-content controller and Subscriber.
    • Configure the Subscriber class to use the new controller and processor logic:
  2. Testing and Validation:

    • Test the new controller to ensure it applies lazy-rendering correctly using all three methods without causing performance issues.
    • Validate that the data-rocket-location-hash attribute is added appropriately to the target elements.
    • Compare the outputs of the three methods to determine the best approach.

For more detailed implementation and reference, please check the prototype PR.

Miraeld avatar Aug 01 '24 07:08 Miraeld

@Miraeld @jeawhanlee While this is an important first step, this is far from completing the front-end part of the feature. This issue seems mostly to be integrating the prototype in a suitable structure for 3.17. This might need more details such as:

  • those methods must not be called if LCR is not activated (based on activation/context?)

Here are the missing points that would probably need dedicated issues for follow-up:

  • When LCR data is available for this URL and screensize, then the controller must:
    • apply the location hash (as per this issue)
    • look for added location hash matching the ones in the DB and replace them by the lazy-render attribute (see here )
    • remove remaining location hashes

Can you create this issue and groom it?


@Miraeld @jeawhanlee @wp-media/qa-team I think the AC for this one are exactly the ones for the prototype we already have, right?

MathieuLamiot avatar Aug 01 '24 19:08 MathieuLamiot

@MathieuLamiot I think we can add the missing parts here as I don't think it's worth opening a dedicated git issue, WDYT?

jeawhanlee avatar Aug 05 '24 11:08 jeawhanlee

@jeawhanlee Thanks for raising the point. I feel like the missing part (applying the lazy render attribute) will be a complex thing, requiring a few days of work. On the other hand, if we have a frontend controller that only does the part where there is no data in the DB (adding the hashes), then it unlocks many dependencies: we would already be able to test data generation end-to-end: injecting beacon + hashes and getting the hashes in DB after a visit.

So, to avoid having everything blocked because we are waiting for the frontend controller to be completed before we start testing end-to-end, I'd advise to go with a dedicated GH issue.

MathieuLamiot avatar Aug 05 '24 12:08 MathieuLamiot

The controller shouldn't be using anything else than DOMDocument right? The code example have use of simplehtmldom, which seems unexpected.

remyperona avatar Aug 06 '24 15:08 remyperona

I discussed this with @jeawhanlee and @Miraeld: the longer we can keep the 3 options (DOMDocument, Regex, SimpleDom) the better as it would allow us to switch to another if we discover a limitation further down the road. However, if maintaining multiple solutions for a few weeks adds complexity/development time, we can drop Regex & SimpleDom already. It looked like we could keep them easily for now by just copy-pasting your prototype.

MathieuLamiot avatar Aug 06 '24 15:08 MathieuLamiot