contao icon indicating copy to clipboard operation
contao copied to clipboard

Large number of old files in `assets/images/deferred/`

Open fritzmg opened this issue 1 year ago • 3 comments

Affected version(s)

4.13, 5.3

Description

Contao uses deferred image processing. When an <img>/<picture> is generated, its images won't be generated right away, only when a browser makes a request to the respective source. Until then a JSON file will reside in assets/images/deferred/ for each image variant.

When creating image sizes one wants to optimize the image size for a wide range of output devices, meaning there can be a lot of permutations per image due the variation in output dimensions and output pixel density. However, in many cases some or a lot of these permutations might never get accessed, also depending on the target audience of the website itself (some websites might primarily have 1x desktop access, while others will have 2x desktop and 3x iPhones accessing the site etc.)

In any case: this means that some image variants will never be accessed and thus the JSON file will remain in assets/images/deferred/. This in itself is not an issue, it's just how the system works.

However we had a few customers where this caused some issues. In one case the unused JSON files occupied several gigabytes of disk space. It is a webshop with a lot of products where the images are updated automatically, thus creating a new ctime for these images and thus creating new JSON files every time - and it uses a pre made theme which unfortunately also uses a lot of image size variants (probably too many). And in another the case file limit of IONOS (250k) was reached and the unused JSON files took up a big chunk of the limit.

Now, in both cases we solved this via a cronjob that simply deletes any JSON files older than 30 days in assets/images/deferred/ (as well as any image older than a year in assets/images/). But this solution would not work if you used the HTTP cache. And of course you should optimize the image sizes so that it generates less variants that might never get used.

So I generally wanted to discuss if anyone has may be an idea on how this could be improved.

fritzmg avatar Jun 01 '24 16:06 fritzmg

Do these projects have more JSON files in var/deferred-images than they have images in assets/images?

Now, in both cases we solved this via a cronjob that simply deletes any JSON files older than 30 days in assets/images/deferred/ (as well as any image older than a year in assets/images/).

Why the differentiation between JSON and image files? Technically deleting either of them should have the exact same implications.

But this solution would not work if you used the HTTP cache.

I’d probably try to solve this by removing older images only once a month by a cron and afterwards clearing the HTTP cache (and rebuilding it using the crawler) and afterwards also run contao:resize-images so that all JSON files get cleared up directly. Additionally runnig contao:resize-images daily could make sense.

ausi avatar Jun 01 '24 18:06 ausi

Do these projects have more JSON files in var/deferred-images than they have images in assets/images?

Do you mean more JSON files in assets/images/deferred than in assets/images/…? In one case yes (the webshop one where the images of the products get their mtime changed frequently).

Why the differentiation between JSON and image files?

I did not want to delete any images that are actually in use (i.e. requested by at least one device).

I’d probably try to solve this by removing older images only once a month by a cron and afterwards clearing the HTTP cache (and rebuilding it using the crawler)

We used to have that, but that got removed in Contao 4.2.2 (https://github.com/contao/contao/commit/51d7a7b920e1808a35f27cae1ff16d37d38c7672). I don't think we should clear the HTTP cache once a month though? Otherwise the cache settings above 1 month do not make any sense anymore.

Additionally runnig contao:resize-images daily could make sense.

I did not/do not want to run contao:resize-images as that would generate image files that might never actually get used.

fritzmg avatar Jun 01 '24 18:06 fritzmg

Do you mean more JSON files in assets/images/deferred than in assets/images/…?

Yes, since Contao 5.0 they are stored in var/deferred-images.

We used to have that, but that got removed in Contao 4.2.2 (51d7a7b). I don't think we should clear the HTTP cache once a month though? Otherwise the cache settings above 1 month do not make any sense anymore.

Clearing the HTTP cache would be OK IMO, if you rebuild it directly afterwards. But this is highly project dependent, I don’t think such a cron job should be added to Contao itself, or at least it should not be enabled by default. An alternative would be to search the HTTP cache for the names of the deleted images and only delete those entries from it.

I’m not sure if there is anything we can improve in Contao itself here. A cleanup command that works reliably everywhere is hard to build I think.

ausi avatar Jun 02 '24 08:06 ausi