wp-rocket
wp-rocket copied to clipboard
Handle different error_code coming from RUCSS insted of just 408
RUCSS is going to send different codes instead of only 408 if the page requires a new RUCSS regeneration issue in RUCSS
407->Protocol error (Target.setAutoAttach): Session closed 408->Navigation timeout 409->Browser couldn't open a new page
We need to change the code to handle the new error codes https://github.com/wp-media/wp-rocket/blob/536ddfa9d318073b4018b068e39151a3141b7d2c/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php#L657
Hi @mostafa-hisham, I would suggest to take into account Gustav's proposal not to use custom HTTP error codes: Even if it does the job for the feature, it can have bad impact on Ops monitoring (they usually monitor service performances based on standardized error codes) and it would not be a good practice.
The goal of the task is to get more details on the errors in the metric system, right? Can the metric system access something else than the "code" field of the response? For instance, we could monitor the "message" field? That could be a good quick win without having to modify the error code mechanism, which will be a bit more tricky (on that topic, which can be dealt with later on, I'll open a dedicated conversation on slack to sort it out).
Moving issue to "Blocked" while we figure out a solution.
discussion on-going here
Following Slack discussion, moving back to "To Grooming" as needs and spec. evolved. Here are the
- Must provide an error categorization system to ease operations on failed tasks for new versions of WP Rocket.
- Must remain backward compatible with older WP Rocket versions, especially the "reschedule on error" feature.
- Must provide an error categorization system that can be easily refined without breaking backward compatibility.
- Should refactor the error categorization system to avoid relying on error codes too similar to HTTP response codes.
Several approaches were mentioned and should be investigated during grooming:
- Send the WP Rocket version when scheduling a job, so that RUCSS can handle backward compatibility.
- Migrate from error codes being numbers (like 500, 408) to an enum of standardized error reasons.
- Return a "could_retry" field in case of errors on RUCSS to notify the plugin that the job should be re-scheduled, when it is the case (previously done with 408)
@MathieuLamiot @piotrbak @DahmaniAdame I think that sending the WPR version to RUCSS and changing the response based on that is going to be a lot of work and will make the code more complicated
Scope a solution
I thinkcould_retry
flag is a good idea we could send it back with 408 status for the next upcoming versions as my discussion with @MathieuLamiot in Slack and modify the plugin code to check both 408 or the flag and store whatever message or error code we have
then after that we can decide how the error_code we send back should look and the plugin will not depend on them to retry sending a new job but will depend on the could_retry
flag
we need to send back the could_retry
flag with each 408 response in the following
https://github.com/wp-media/nodejs-treeshaker/blob/c9fd859c46c5ec5d981abcbb16b47594e92a2223/rucss/libs/rucss-manager.js#L73
https://github.com/wp-media/nodejs-treeshaker/blob/c6a67058b3889ee794b9e21866c8bbda16ab23b7/rucss/libs/puppeteer-manager.js#L267
https://github.com/wp-media/nodejs-treeshaker/blob/dbd2f0670d72eaab162d3fc8f967c43a42948853/rucss/processor.js#L77
and on the plugin side, we should change the code to check for the could_retry
or status 408 to retry new jobs here
https://github.com/wp-media/wp-rocket/blob/219a0989dc6ec41db29b47ac38f0869da57a3e86/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php#L680
Estimate the effort
Effort S
The solution proposed by @mostafa-hisham is a first step so that we can later on implement error categorization. I created this task as a follow-up/placeholder to actually implement the error categorization if needed.
I added the proposition into an issue from the RUCSS as I detail technical parts inside it and this issue is one from wpr aka a public one. https://github.com/wp-media/nodejs-treeshaker/issues/421
We are going to do what @CrochetFeve0251 said here
@MathieuLamiot At the end we agreed to go on a mix of both solution.
@mostafa-hisham make me noticed the error code weren't HTTP standards code so using them is a bad pattern. Due to that we decided to use the flag as output
could_retry
value in the final response.On another hand to keep the change simple for the moment we decided to not touch the business logic inside the worker and continue to return 408 for the moment. That also make sense as the retry flag doesn't seems to part of the business logic but more an use case for the plugin needs and modify only the endpoint to adapt the current 408 error response to add the flag seems more appropriate.
For that inside the handler we will add the following logic:
.then((data) => { if ( data.returnvalue.code === 408) { data.returnvalue.could_retry = data.error; } return data; })
and on the plugin side, we should change the code to check for the could_retry
or status 408 to retry new jobs here
https://github.com/wp-media/wp-rocket/blob/219a0989dc6ec41db29b47ac38f0869da57a3e86/inc/Engine/Optimization/RUCSS/Controller/UsedCSS.php#L680
Estimate the effort
Effort S
We'll create Epic based on this conversation: https://wp-media.slack.com/archives/C06CQPWEJSK/p1720675770855359?thread_ts=1712147619.207269&cid=C06CQPWEJSK