magento2-module-image-cleanup icon indicating copy to clipboard operation
magento2-module-image-cleanup copied to clipboard

Cannot delete the subdirectory

Open hgati opened this issue 1 year ago • 5 comments

Hi, I am trying to delete an unused hash directory, but an error occurs stating that the subdirectory (image) cannot be deleted.

Image

Under the hash directory, there is a directory called "image" with the following contents.

Image

hgati avatar Jul 09 '24 02:07 hgati

My guess is that while you're trying to delete the directory, some new images are being generated by visitors on the frontend of the shop. Which is kind of weird, as this particular subdirectory is not supposed to be in use.

I guess you can try to run the command a few times after each other, and eventually it should work, but I'm afraid that this particular directory will also return.

So we need to figure out why this tool thinks this subdirectory is not in use, while it actually seems to be in use.

hostep avatar Jul 09 '24 06:07 hostep

It seems your guess was correct. After running it several times, it eventually got deleted.

Image

hgati avatar Jul 09 '24 07:07 hgati

I dug in some more, while running on this same issue myself on some project.

So, this module calculates the hashes from the different image specs defined in the etc/view.xml files in all themes of the project. However, you can also programmatically request images to be resized, see code snippet below. And for the hashes you programmatically generate, this tool can't discover them as they aren't defined in a consistent or easy to find way.

Snippet to resize image in code, that I found in one of our projects:

        /* $this->_imageHelper is an instance of \Magento\Catalog\Helper\Image */

        $images = $product->getMediaGalleryImages();

        foreach ($images as $image) {
            $smallImageUrl = $this->_imageHelper
                ->init($product, 'product_page_image_small', ['width' => '400', 'height' => '300'])
                ->setImageFile($image->getFile())
                ->getUrl();
            $image->setData('small_image_url', $smallImageUrl);
        }

This snippet says: "take the image definitions defined in the view.xml file for product_page_image_small and overwrite the width & height params".


A possible solution I'm currently thinking on - which requires some manual work - could be where we give you the opportunity to define in the backoffice those extra specifications and take those in consideration when calculating which hashes are being used so we don't remove such directories that are actually being used. Automatically detecting this would be hard I think, so having a way to manually configure these extra exceptions is probably the way to go.

hostep avatar Dec 17 '24 13:12 hostep

Here is the code from the Mirasvit Follow Up Email extension that sends emails to users. It also generates image sizes through hardcoded values(160x160).

$collection = $block->getItems();
$idx = 0;
/** @var \Mirasvit\Core\Helper\Image $imageHelper */
$imageHelper = $this->helper('Mirasvit\Core\Helper\Image');

?>
<?php if ($collection->getSize()): ?>
    <table id="cross_sells_block" width="0" border="0" cellspacing="1" cellpadding="5">
        <tr>
            <?php foreach ($collection as $product): ?>
                <td valign="top" align="center" width="25%">
                    <a href="<?= $product->getProductUrl() ?>" title="<?= $block->escapeHtml($product->getName()) ?>">
                    <?php // https://localhost/media/catalog/product/cache/160x160/077fd1980b44449e6aab430673fbdfa09c3982ab1566e7931152c1417054276f/db829f3462d318c0e26591a06a08bd2b8afccc8ad5e69a550eb7106126fdd227.jpg ?>
                        <img
                            src="<?= $imageHelper->init($product, 'image', 'catalog/product')->resize(160, 160) ?>"
                            width="100" height="100" alt="<?= $block->escapeHtml($block->getImage($product, 'category_page_grid')->getLabel()) ?>"/>
                    </a>
                    <p>
                        <a href="<?= $product->getProductUrl() ?>"><?= $product->getName() ?></a>
                    </p>
                </td>
                <?php $idx++ ?>
                <?php if ($idx >= 4) break; ?>
            <?php endforeach ?>
        </tr>
    </table>
<?php endif ?>

Image

Image

Image

Oh my God !! Now I finally understand why the 160x160 directory was created. The Mirasvit extension resizes product images when sending emails and includes them in the email content. As a result, after performing an image cleanup, I just discovered that product images were broken when customers opened their emails. In the Follow Up Email extension, it was sending emails to customers periodically based on the cron schedule while automatically generating thumbnail product images.

Image

There are likely many third-party modules with similar hardcoded implementations.

Also I have seen options in the admin panel settings of the current theme where I can adjust the mobile image width and other settings related to the product page. There are likely many such image sizes specified through admin panel settings.

Image

Image

Image

Image

So, it seems that after performing image cleanup, the product images on the main page of the site always disappear.

When I run the image cleanup, it always shows that around 20GB of files need to be cleaned up periodically. I was curious why so many files are always being automatically generated, but now I understand that there was a reason for it.

The 160x160 directory contains temporary images included in emails (with relatively small file sizes), while the other hashed directories are automatically generated images triggered by admin theme settings when loading pages. In other words, each hashed directory corresponds to images generated for specific pages like the homepage, category pages, etc. However, since these hashed directories are quickly filling up to 20GB within a day or two, I think that bots are crawling the entire site, causing the images to be generated excessively.

hgati avatar Mar 02 '25 16:03 hgati

Hey @hgati

That's very useful information, thank you for investigating and bringing this to light.

I'll see what we can do about this, I have some kind of idea I started working on a few months ago over here, that dynamically listen for all resized image usages, which probably should also find this particular one. And then store all that information in the database (that's still todo, it's currently stored in cache) together with the timestamp of when last seen. And then use that information to not remove directories that were seen being used in the last x amount of days (where x is configurable). Still quite some work ahead for this feature, I'll see when I find time to pick this up further.

For now, I would suggest to stop using this remove-unused-hash-directories cli call until this is properly fixed.

Thanks again!

hostep avatar Mar 03 '25 06:03 hostep