prerender-java
prerender-java copied to clipboard
PrerenderSeoService does not honor whitelist
I am trying to prerender pages based on URLs in a whitelist regardless of User-Agent.
PrerenderSeoService
does not properly honor the whitelist configuration. Lines 111 to 114 of PrerenderSeoService
are currently:
if (whiteList != null && !isInWhiteList(url, whiteList)) {
log.trace("Whitelist is enabled, but this request is not listed; intercept: no");
return false;
}
...
Nowhere is true
explicitly returned if a given URL appears in the whitelist. Instead, the remaining checks are run and my URLs are being improperly captured by if (!isInSearchUserAgent(userAgent))
(because I am trying to capture URLs by a whitelist instead of by a search bot User-Agent.)
The logic should be partially flipped to return true
and prevent any remaining checks:
if (whiteList != null && isInWhiteList(url, whiteList)) {
log.trace("Whitelist is enabled, and this request is listed; intercept: yes");
return true;
}
...
The above would short-circuit the remaining checks and return true
as expected regardless of User-Agent
A suggested solution, proposed by @greengerong, would be to introduce a new flag to suppress User-Agent checking. Adding a whitelist and suppressing User-Agent checking would resolve the use case outlined above.
The current implementation of the whitelist is correct. It should always still run the user agent check after the code has determined that the URL passes the whitelist check.
What is your use case for suppressing user agent checking? We don't recommend serving Prerendered pages to users when using our prerendering service since we remove javascript tags to prevent crawlers from trying to execute the javascript and clearing out the static html that has been prerendered.
Thank you for the clarification. This makes sense. I am looking to pre-render a series of pages for users - basically perform server-side rendering of javascript templated pages. I was hoping prerender would provide a snapshot of the DOM after javascript activity had settled and maintain script references to handle user actions. Am I thinking of the problem wrong, or is there another way to handle this use case with prerender? Thank you!
Happy to help!
The problem here is the Prerender server will load all requests for a page, so it can take at least a few seconds to render the page in the best case scenario. That means your users could see a really slow page load when they could be rendering the javascript themselves much faster in their browser.
Also, if you leave the<script>
tags on the page, the browser will run those script tags which will most likely just clear out the app root and start again from nothing. So your users would see a flash of the prerendered page and then it could disappear while their browser tried to re-execute the javascript.
So if you are looking for real server-side rendering, you'll want to use a product that serializes the state when rendered on the server side and passes that state to be deserialized on the client side instead of re-rendering the page from scratch. That's a little more baked into SSR with frameworks Angular and React. And that's the reason why we only suggest serving prerendered pages to the crawlers if using our Prerender server. The crawlers don't need the javascript since they don't interact with the page, they just read the html.
Let me know if you have any questions about that.
This perspective is very helpful. I have been operating under the assumption that Prerender could also be used to produce human-consumable SSRed pages. In our set-up, we have multiple layers of page caching, and we prime the cache before we start serving traffic from a given server, so I am less concerned about time to render.
The removal of <script>
tags is interesting. We actually place logic in our apps that check for the presence of server-side rendered content, so the app does not remount on load. I have been researching generic solutions for performing SSR and Prerender stood out as a potential option. Do you know if there is a configuration option on the Prerender server that could be used to retain the <script>
tags? I am planning on setting up an internal instance of the server.
Or do you feel that I may be attempting to use Prerender in a way that is too far off from its original intent?
Again, thank you very much for your help.
You can preserve the <script>
tags if you run your own prerender server and just don't add the plugin in server.js
to remove script tags. Removing the script tags is optional when running your own prerender server.
That being said, this middleware repo is meant to be a drop-in config for use with https://prerender.io and our hosted service which removes <script>
tags from the page to serve to the crawlers. So ideally this repo would stay that way. If you are able to fork this repo and add your change for your specific use case, I think that's the best way forward.