Image file scan takes longer with clamav version 0.105 or later.
Describe the bug
Image file scan is taking more time with clamav version 0.105 or higher compare to older versions 0.104.2 or 0.103.8 Issue dose not occur while scanning other file types.
Issue occurs on scanning any image file. Here is one example and output from both the version are as below 0.104.3 or 0.103.8 / # clamscan --version ClamAV 0.104.3/26858/Wed Mar 29 07:28:45 2023 /etc/clamav # clamdscan 1.png /etc/clamav/1.png: OK
----------- SCAN SUMMARY ----------- Infected files: 0 Time: 0.044 sec (0 m 0 s) Start Date: 2023:03:29 11:34:18 End Date: 2023:03:29 11:34:18
0.105.2 or 1.0.1 /etc/clamav # clamdscan --version ClamAV 1.0.1/26858/Wed Mar 29 07:28:45 2023 /etc/clamav # clamdscan 1.png /etc/clamav/1.png: OK
----------- SCAN SUMMARY ----------- Infected files: 0 Time: 0.156 sec (0 m 0 s) Start Date: 2023:03:29 11:25:38 End Date: 2023:03:29 11:25:39
How to reproduce the problem
Scan any size png file Scan file using clamdscan. Scan time with version 0.105 or 1.0.1 version is very high compare to 0.104.2 or 103.8
This is not surprising. We added a new feature to create fuzzy hashes for PNG, JPEG, TIFF, and GIF images in 0.105 which you can find in the 0.105 release notes.
This is to support image fuzzy hash signatures: https://docs.clamav.net/manual/Signatures/LogicalSignatures.html#image-fuzzy-hash-subsignatures
Could we consider adding a feature that allows the user to disable the fuzzy image scanning option?
Could we consider adding a feature that allows the user to disable the fuzzy image scanning option?
@net1 - Yes, Please add option to disable this feature.
This is not surprising. We added a new feature to create fuzzy hashes for PNG, JPEG, TIFF, and GIF images in 0.105 which you can find in the 0.105 release notes.
This is to support image fuzzy hash signatures: https://docs.clamav.net/manual/Signatures/LogicalSignatures.html#image-fuzzy-hash-subsignatures
@micahsnyder - is it possible to disable this feature?
@micahsnyder This feature made ClamAV much slower for these use cases. Is there a way to avoid this? I checked the change that introduced the feature, but I'm not entirely familiar with the source code. https://github.com/Cisco-Talos/clamav/commit/fd587c741c0ca88d2a6493e9e85a2bf2453687ee#diff-9ea761d49c419551e2cc5b26230a94bcfcd4fd52f1d65ba237d7717bcef999b7R4642
Am I correct to assume based on the above changes, that it is not possible at the moment to turn this feature off?
I don't believe it is possible to turn it off at this time, without turning off support for scanning image files in general.
What I would like to do in the future is add an option to disable calculating the image fuzzy hash for images that are not found in other files. This would help for the use case where people are scanning all files on their hard drive.
The problem is that image fuzzy hash signatures are particularly effective in detection malicious emails. So any scanning services that sent email attachments to be scanned will want to have this feature enabled.
Dear @micahsnyder My understanding of the feature is, that fuzzy hash signatures are only useful if there are logical signatures used. (*.ldb *.ldu; *.idb) Based on this document: https://docs.clamav.net/manual/Signatures.html
If someone is not using logical signatures with ClamAV does fuzzy hash calculation has any value? Why is it still calculated then?
Can you please explain me?
@shrgabor Sorry I missed your question. If you are running clamav with your own custom databases and no logical signatures then no there would be no point in the fuzzy hash calculation. I suppose some logic could be added in there to disable it if there are no logical signatures or more specifically if there are no logical signatures that use the fuzzy hash feature.
The only exception I can think of is if clamav / libclamav are used to create fuzzy hashes for making signatures or else for other analysis/comparison purposes, such as when sigtool uses libclamav to generate fuzzy hashes. So some "analyst mode" to always create fuzzy hashes would be needed.
On a bit of a tangent: We've been thinking about having an analyst mode for clamav in general for recording and presenting metadata to analysts and developers along with warnings that only a developer or analyst may care for... but we have yet to implement that.