magento2-module-image-cleanup icon indicating copy to clipboard operation
magento2-module-image-cleanup copied to clipboard

Request: Duplicate images

Open jorgb90 opened this issue 1 year ago • 3 comments

Great addition to this already great plugin would be an option to remove duplicates. Duplicate images occur during importing of products in bulk for example.

jorgb90 avatar Apr 17 '24 08:04 jorgb90

Hi @jorgb90: can you give me some more info about this particular situation?

The tool will already remove any file which isn't referenced in the database, so even if duplicated images exist on filesystem, they should get removed if they aren't referenced in the database.

So I'm not quite sure what you mean exactly? Some concrete example could help here.

Thanks!

hostep avatar Apr 17 '24 12:04 hostep

@hostep I have a Magento setup which has all images per product three times. I first thought this was happening because of updating the products through imports, but upon further investigation its happening because its uploading them for global and 2 storeviews.. :/ I guess we just need to delete the duplicates in the storeviews..

My initial thought was they are uploaded and are available in the database and filesystem, so currently won't be detected by this extension, but the hash of those images should be the same since its the same image. This way that can also be cleared up and prevent duplicate images.

jorgb90 avatar Apr 17 '24 18:04 jorgb90

Aha okay, it makes sense now.

Do the database entries use the exact same filename for global and storeview values? Or are the files also duplicated on disk with a different filename?

If they are the same filename, I think by using the EAV Cleaner module with its eav:attributes:restore-use-default-value command might be able to get rid of those unneeded values in the database.

If they aren't the same filename, then I guess we could add some checking in this module for this by:

  • looping over all products that have more than a single image (global + storeview values combined)
  • calculate the hash of each image file
  • if a duplicated match is found, we can try to delete the duplicated value from the database
  • after this command is over, running our catalog:images:remove-unused-files command then should delete the duplicates from disk as they are no longer in the database

This is probably a very expensive operation and might take a lot of minutes if not hours to run if you have tons of products. So I'm not sure yet if we will have time to implement this and if it's worth it at the moment.

hostep avatar Apr 17 '24 18:04 hostep