imagify-plugin icon indicating copy to clipboard operation
imagify-plugin copied to clipboard

High Image Counts Cause Errors in Media and Bulk Optimization Pages

Open joejoe04 opened this issue 4 years ago • 14 comments

Describe the bug There are some cases, like this one, where a large number of Media Library images causes the Media Library and Bulk Optimization pages to either have errors or not load and make optimization not possible. It seems related to the counting of images and optimization statistics.

To Reproduce Could be difficult to reproduce, but it can happen when there are a large number of Media Library images. In the example above, the user had over 81,000 images and 17 thumbnails for each.

Expected behavior Ideally, we could find a solution to do the image and stat counting in a more efficient way so the large number of images would not cause problems anymore. If that's not possible, we could at least add a filter to allow for limiting the number of images that are counted (we have this for the Media Library page, but not for the Bulk Page).

Additional context We are able to use the following filter to fix these issues in the Media Library and allow the page to load:

function pfx_limit_imagify_optimize_count( $count ) {
        return 1;
}
add_filter( 'imagify_count_optimized_attachments', 'pfx_limit_imagify_optimize_count' );
add_filter( 'imagify_count_attachments', 'pfx_limit_imagify_optimize_count' );
add_filter( 'imagify_count_unoptimized_attachments', 'pfx_limit_imagify_optimize_count' );
add_filter( 'imagify_count_saving_data', 'pfx_limit_imagify_optimize_count' );
add_filter( 'imagify_count_error_attachments', 'pfx_limit_imagify_optimize_count' );

But that's is not entirely optimal, as it causes the filter dropdown to be inaccurate:

image

And it also doesn't work for the Bulk page, so for that I tried using the following:

function pfx_imagify_bulk_stats_remove_all_data() {
	$data = array(
		// Global chart.
		'total_attachments'             => 0,
		'unoptimized_attachments'       => 0,
		'optimized_attachments'         => 0,
		'errors_attachments'            => 0,
		// Stats block.
		'already_optimized_attachments' => 0,
		'original_human'                => 0,
		'optimized_human'               => 0,
	);
        return $data;
}
add_filter( 'imagify_bulk_stats', 'pfx_imagify_bulk_stats_remove_all_data', 10, 2 );

While that did limit the display of the stats to all zeros, it did not allow the user's Bulk page to load, and I think the reason is that even with the above filter applied, the imagify_get_bulk_stats function is still being run in /wp-content/plugins/imagify/inc/functions/admin-stats.php so the stats are still being counted (though not displayed) for the very large number of images (and causing the issue for the user still).

Backlog Grooming (for WP Media dev team use only)

  • [ ] Reproduce the problem
  • [ ] Identify the root cause
  • [ ] Scope a solution
  • [ ] Estimate the effort

joejoe04 avatar Nov 05 '21 15:11 joejoe04

https://wordpress.org/support/topic/slow-media-library-search-pt-2/ https://wordpress.org/support/topic/media-library-search-of-images-slower/

There are many tickets in HelpScout but not tagged properly in the past, example: https://secure.helpscout.net/conversation/2690600261/509386/

General issue behaviour: Error 524, Error 500 on bulk optimization page or Media library. Issue resolves when Imagify is deactivated or above custom code is added(only for some cases)

This, along with https://github.com/wp-media/imagify-plugin/issues/865 should be prioritized considering the number of tickets. @piotrbak

saranshj4 avatar Sep 02 '24 06:09 saranshj4

A mitigation has been released with 2.2.3.2. Further optimization is possible as stated here by doing the SQL request only when needed.

MathieuLamiot avatar Nov 19 '24 06:11 MathieuLamiot

Umm not an easy one.

Scope a solution

First of all, we could add filters to limit the counting like explained in the issue as a safeguard in case there is a problem:

  • In inc/functions/admin-stats.php:
  • Add a filter (imagify_count_unoptimized_attachments_limit) to limit the number of images counted in imagify_count_attachments
  • Add a filter (imagify_count_error_attachments_limit) to limit the number of images counted in imagify_count_error_attachments
  • Add a filter (imagify_count_optimized_attachments_limit) to limit the number of images counted in imagify_count_optimized_attachments
  • Add a filter (imagify_count_unoptimized_attachments_limit) to limit the number of images counted in imagify_count_unoptimized_attachments
  • We would probably need to add these kind of filters in inc/3rd-party/nextgen-gallery/inc/functions/admin-stats.php as well.

Then about finding a real solution, I got two ideas and I'm demanding @wp-media/engineering-plugin-team opinions on them:

Optimizaiton of SQL queries

  • Optimize the SQL queries used for counting attachments and statistics. For example, in inc/functions/admin-stats.php (inc/3rd-party/nextgen-gallery/inc/functions/admin-stats.php), the imagify_count_attachments function can be optimized by reducing the number of joins and conditions in the query.
  • Use indexes on relevant columns in the database to speed up the counting process. This can be done by adding indexes to columns like post_mime_type, post_type, and post_status in the wp_posts table.

Use Background Processing

  • Implement background processing for counting files and statistics. This can be done using the existing background processing classes like Imagify_Abstract_Background_Process in inc/classes/class-imagify-abstract-background-process.php (inc/classes/class-imagify-abstract-background-process.php).
  • Schedule background tasks to count files periodically and store the results in a cache or transient. This way, the counting process does not block the main thread and can be done asynchronously.

Miraeld avatar Dec 01 '24 19:12 Miraeld

@Miraeld Have you checked the suggested direction in my comment above? @remyperona identified that we do the query even if we don't display the upgrade pop-up. His idea was to do it "just-in-time" when displaying the pop-up, as it is the case with other components of the pop-up already. I would not recommend adding indexes to wp_posts as there an be many things there, especially in post_type and post_status ; it's not our table so it's quite risky to manipulate it. About the background processes, it could help, definitely. This is a bit related to @remyperona 's idea.

Moving back to "Grooming in progress".

MathieuLamiot avatar Dec 02 '24 07:12 MathieuLamiot

Yep I checked, And I wanted to add something "more". That's why i offered two ways I thought about to enhance the performances even more. I should have mentioned @remyperona idea, sorry for that.

And finally. I've moved it to ready for review to get a feedback about my ideas. :)

Miraeld avatar Dec 02 '24 07:12 Miraeld

Scope a solution ✅

There's 2 sides to this issue

  • The part mentioned in the issue above relating to the image stats on the bulk optimization and media library screen.
  • Payment modal which loads image stats which in turn performs large query, but this has been optimized to an extent with caching in transient.

Bulk optimization & Media Library Screen

There's no easy way to handle this part but the main problems are:

  • https://github.com/wp-media/imagify-plugin/blob/e046686926c5bf46a73014b7404f2d1850dae27c/inc/functions/admin-stats.php#L12
  • https://github.com/wp-media/imagify-plugin/blob/e046686926c5bf46a73014b7404f2d1850dae27c/inc/functions/admin-stats.php#L70
  • https://github.com/wp-media/imagify-plugin/blob/e046686926c5bf46a73014b7404f2d1850dae27c/inc/functions/admin-stats.php#L121
  • https://github.com/wp-media/imagify-plugin/blob/e046686926c5bf46a73014b7404f2d1850dae27c/inc/functions/admin-stats.php#L233

We can implement caching here, but I'm not sure it will be sufficient, I have 2 options for this:

  • We can create new property in the imagify_settings option or we create a new set of options for this which will hold the attachment counts, attachment error counts, optimized counts etc. This will serve for doing a real time count update of new attachments, optimized attachments, unoptimized attachments, attachments with optimization errors. The way we will handle is to update the option every time a new image is uploaded or after an image is optimized or when there's an error for a image but since this will not be backward compatible, we can run a background process on upgrade to new version that will process the logic of each of the highlighted methods above and save them in the proposed option, then subsequently we'll use the data from the new option.
  • Another option is background process like @Miraeld has mentioned, we can run a cron that will periodically run the count methods and save the results in a new option. each method logic can be in a queue that will run one after the other. We can do this with Action Scheduler I believe.

Payment Modal

We can further optimized this by rendering the stats asynchronously on demand whenever this button is clicked, we can do this via js, so we will no longer need to directly pass the attachments_number data directly to the view when pages are loaded.

P.S: the fix for the bulk optimization & media library page is done properly will directly affect the payment modal part, removing the need for caching there.

Estimate the effort ✅

M - L

jeawhanlee avatar Dec 04 '24 17:12 jeawhanlee

Thank you @jeawhanlee for the grooming. I am not a huge fan of holding counts at each add/delete of images: it can be hard to cover all cases and once the count is wrong, it's hard to go back, etc. I'd rather have a solution where we can update the value from scratch, from time to time.

The background process with AS sounds like a good long-term idea to me. The fix on the Payment modal with "on demand" is also a good short-term solution, but it won't cover the bulk optim. & media library screens ; and it won't be needed if we use a background process with AS.

@remyperona do you have an opinion on this?

@jeawhanlee @remyperona Could you provide:

  1. an estimate for implementing the background task with AS + using the result in the impacted pages?
  2. an estimate for implementign the "on demand" call for the Payment modal?

I would say that, if the background task approach is manageable in 2/3 days, let's do this. Otherwise, if the "on demand" is very quick and the background task way too long, we could fallback to the "on demand" in the very short term...

MathieuLamiot avatar Dec 05 '24 18:12 MathieuLamiot

I think we should break down this issue into 3 sub-issues that can be worked on separately:

  • One to optimize the SQL queries (as stated by Gaël there might be way to improve them, without adding indexes)
  • One to implement the on-demand call for the payment modal (should be fairly quick, couple of days max)
  • One to implement the background processing of this data

For the background processing we have to define how to store the data, and how often we would want to run the task.

remyperona avatar Dec 05 '24 19:12 remyperona

Could we store the data in a WP Option? Or maybe a transient with no expiration.

Once a day would be more than enough.

Do you think the backgrpund thing would be more than M? In which case we can start with the others. But if it is doable in 3/4 days, let's go with it directly, no? It does not seem like a big thing, as we already have AS, and the logic would not change much

MathieuLamiot avatar Dec 05 '24 21:12 MathieuLamiot

An option is the best idea.

I don't think using AS would be more than M. I still think each part has value on its own, and will give cumulative improvements, but there is no issue starting with the background processing.

remyperona avatar Dec 06 '24 14:12 remyperona

Ok, thanks. I agree all ideas will bring value in the long run. @jeawhanlee, @remyperona can one of you create sub-issues based on the above? You can then put the "background task" one in the current sprint. If it's an M, seeing the current state of the board, we can try to move forward with it 💪

MathieuLamiot avatar Dec 06 '24 16:12 MathieuLamiot

Was this sub-issues eventually created ?

Khadreal avatar Dec 09 '24 09:12 Khadreal

Not yet as far as I know.

MathieuLamiot avatar Dec 09 '24 09:12 MathieuLamiot

From a similar sites with about 84k+ images the settings, bulk optimization page was not accessible(timeout error), we found out that the function here could be one of the major culprit, when the query was optimised the pages were opening but a bit slow.

Another thing to point out from this is imagify_has_attachments_without_required_metadata function behaves in a recursive-like, I said this because some of the functions within this function make a call back to this function. For example this Imagify_DB::get_required_wp_metadata_join_clause within imagify_has_attachments_without_required_metadata function whenever it's called also make another call back to this imagify_has_attachments_without_required_metadata.

We would need to refactor this process and also found a more optimised way to get image stat without overloading the DB.

The optimised query looks like this

SELECT p.ID 
FROM wp_posts AS p
WHERE p.post_mime_type IN ('image/jpeg', 'image/png', 'image/gif', 'image/webp', 'application/pdf')
    AND p.post_type = 'attachment'
    AND p.post_status IN ('inherit', 'private')
    AND EXISTS (
        SELECT 1
        FROM wp_postmeta AS imrwpmt1
        WHERE imrwpmt1.post_id = p.ID
            AND imrwpmt1.meta_key = '_wp_attached_file'
            AND (
                imrwpmt1.meta_value IS NULL
                OR imrwpmt1.meta_value LIKE '%://%'
                OR imrwpmt1.meta_value LIKE '_:\\\\\\%'
                OR (
                    REVERSE(LOWER(imrwpmt1.meta_value)) NOT REGEXP '^gpj\\..*|^gepj\\..*|^epj\\..*|^gnp\\..*|^fig\\..*|^pbew\\..*|^fdp\\..*'
                )
            )
    )
    AND NOT EXISTS (
        SELECT 1
        FROM wp_postmeta AS imrwpmt2
        WHERE imrwpmt2.post_id = p.ID
            AND imrwpmt2.meta_key = '_wp_attachment_metadata'
            AND imrwpmt2.meta_value IS NOT NULL
    )
LIMIT 1

cc @wordpressfan

Khadreal avatar Jan 15 '25 17:01 Khadreal