clamav
clamav copied to clipboard
Since version 0.105 the scan is unbearable slow
A full scan on $HOME needs now 98 mins with version 0.105:
----------- SCAN SUMMARY -----------
Known viruses: 8616359
Engine version: 0.105.0
Scanned directories: 6240
Scanned files: 98280
Infected files: 3
Total errors: 8
Data scanned: 22403.29 MB
Data read: 12333.45 MB (ratio 1.82:1)
Time: 5897.640 sec (98 m 17 s)
Start Date: 2022:05:14 10:17:19
End Date: 2022:05:14 11:55:36
After downgrade to 0.104 and perform a scan on the same folder a few minutes later, its completed in 26 mins;
----------- SCAN SUMMARY -----------
Known viruses: 8616428
Engine version: 0.104.2
Scanned directories: 6240
Scanned files: 97797
Infected files: 3
Data scanned: 17019.32 MB
Data read: 12226.76 MB (ratio 1.39:1)
Time: 1569.143 sec (26 m 9 s)
Start Date: 2022:05:14 09:42:41
End Date: 2022:05:14 10:08:50
That is about four times faster, and the normal duration experienced in the past. The amount of files between the runs is almost the same, but the "Data scanned" (whatever that means) is remarkable different.
I also get now a "Can't parse data ERROR" on different PNG and PDF files while scanning with version 0.105, the same files can be processed with version 0.104. -> will be handled in separate report #593
Every suggestion is appreciated to get back the old speed, known until version 0.104.
I noticed the same thing! For now, Clamav remains only the second solution.
Thank you for letting us know. Are there a particular set of files/file types that are causing the issue. Any sample files you could provide us with to demonstrate the issue would help in fixing it.
Thanks, Andy
As described, the problem occurs when scanning all of $HOME. I can't tell you which of the >98,000 files is causing the problem; I guess all. The log also doesn't show how long a scan of each file took, but only gives an overall statistic, so I can't provide a sample file either. The overall performance is significantly worse with version 0.105 than with version 0.104, I'm sorry but that's all I can say.
I also get now a "Can't parse data ERROR" on different PNG and PDF files while scanning with version 0.105, the same files can be processed with version 0.104.
On this point, I've hit what looks to be the same issue, so I've opened a separate issue for it - #593. I'm unable to attach the PDF I'm experiencing the issue with due to sensitive contents.
@martin-ms, if you have any example files that aren't sensitive, and are reporting "Can't parse data ERROR", could you attach them to that ticket to help with diagnosis.
@alext done
Thank you for the updates. I understand that you cannot determine which files are causing the issues. I attempted scanning the sample in https://github.com/Cisco-Talos/clamav/issues/593, and it scanned much quicker with 0.105 than with 0.104, so they don't appear related.
I'll let you know when I am able to reproduce the issue.
When I was reading this earlier, I had initially thought the scan time may be longer because we're now calculating fuzzy hashes for image files. But then we realized a much more obvious reason. In 0.105 we increased the default max file-size, max scan-size, etc.
Specifically:
- MaxFileSize 25M -> 100M
- MaxScanSize 100M -> 400M
- StreamMaxLength 25M -> 100M
- MaxEmbeddedPE 10M -> 40M
- MaxHTMLNormalize 10M -> 40M
- MaxHTMLNoTags 2M -> 8M
- MaxScriptNormalize 5M -> 20M
- PCREMaxFileSIze 25M -> 100M
Ref: https://github.com/Cisco-Talos/clamav/pull/489
@martin-ms what scan options do you use? If you're scanning with the defaults, then it would make a lot of sense that 0.105 is significantly slower. 0.105 will be scanning a lot more files, and a lot more data in those files.
Thank you for taking care of the issue.
Although I had used the default settings, I now changed the variables mentioned to the old values. clamconf reports as non-default values
Config file: clamd.conf
-----------------------
LogFile = "/var/log/clamav/clamd.log"
LogTime = "yes"
PidFile = "/run/clamav/clamd.pid"
TemporaryDirectory = "/tmp"
LocalSocket = "/run/clamav/clamd.ctl"
StreamMaxLength = "26214400"
User = "clamav"
MaxScanSize = "104857600"
MaxFileSize = "26214400"
MaxEmbeddedPE = "10485760"
MaxHTMLNormalize = "10485760"
MaxHTMLNoTags = "2097152"
MaxScriptNormalize = "5242880"
PCREMaxFileSize = "26214400"
but unfortunately it doesn't change the behavior that it still runs much longer than with version 0.104:
----------- SCAN SUMMARY -----------
Known viruses: 8617579
Engine version: 0.105.0
Scanned directories: 6387
Scanned files: 114546
Infected files: 3
Total errors: 8
Data scanned: 22744.92 MB
Data read: 12572.25 MB (ratio 1.81:1)
Time: 5998.697 sec (99 m 58 s)
Start Date: 2022:06:07 10:33:11
End Date: 2022:06:07 12:13:10
Here for comparison the same task a few minutes later with the same settings with version 0.104:
----------- SCAN SUMMARY -----------
Known viruses: 8617620
Engine version: 0.104.2
Scanned directories: 6387
Scanned files: 111993
Infected files: 3
Data scanned: 17411.08 MB
Data read: 12474.23 MB (ratio 1.40:1)
Time: 1710.669 sec (28 m 30 s)
Start Date: 2022:06:07 12:15:43
End Date: 2022:06:07 12:44:13
I don't know if it's important, but I got with v0.104 several
LibClamAV Warning: cli_scanxz: decompress file size exceeds limits - only scanning 27262976 bytes
warnings, but not with v0.105, although with the same settings. Does v0.105 probably scan more than defined in the settings, or does not respect some settings? The values of "Data scanned" are also different.
Apologies I should've shared the options for use with clamscan. clamd.conf does not affect the behavior of clamscan. It only affects clamd in combination with clamDscan and clamonacc.
To get a similar effect with clamscan, you can do something like this:
clamscan --max-filesize=25M --max-scansize=100M --max-embeddedpe=10M --max-htmlnormalize=10M --max-htmlnotags=2M --max-scriptnormalize=5M --pcre-max-filesize=25M /path/to/scan
I tried it with the given command line parameters, but it didn't get significantly faster:
----------- SCAN SUMMARY -----------
Known viruses: 8617586
Engine version: 0.105.0
Scanned directories: 6389
Scanned files: 112078
Infected files: 3
Total errors: 8
Data scanned: 17362.63 MB
Data read: 12453.06 MB (ratio 1.39:1)
Time: 5437.267 sec (90 m 37 s)
Start Date: 2022:06:08 09:29:25
End Date: 2022:06:08 11:00:02
@martin-ms Interesting. I'm not sure what to say. We do a bit of performance profiling/monitoring on a a selection of file types but I think we will have to extend that and compare older and newer versions to understand what's going on.
Abstract booklet CNIC Inflammation Day.pdf
This seems weird. Uploaded PDF takes 120 seconds in 0.105.1 with defaults. Note the scanned data 810 MB in a file of only 17 Mb in size...
root:~# clamscan Abstract\ booklet\ CNIC\ Inflammation\ Day.pdf
Loading: 10s, ETA: 0s [========================>] 8.64M/8.64M sigs
Compiling: 3s, ETA: 0s [========================>] 41/41 tasks
/root/Abstract booklet CNIC Inflammation Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8637607
Engine version: 0.105.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 810.32 MB
Data read: 17.62 MB (ratio 45.99:1)
Time: 120.556 sec (2 m 0 s)
Start Date: 2022:09:29 12:49:59
End Date: 2022:09:29 12:51:59
But it ony takes 19 seconds in 0.104.4 or 0.105.1 with same limits.
root:~# clamscan --max-filesize=25M --max-scansize=100M --max-embeddedpe=10M --max-htmlnormalize=10M --max-htmlnotags=2M --max-scriptnormalize=5M --pcre-max-filesize=25M Abstract\ booklet\ CNIC\ Inflammation\ Day.pdf
Loading: 10s, ETA: 0s [========================>] 8.64M/8.64M sigs
Compiling: 3s, ETA: 0s [========================>] 41/41 tasks
/root/Abstract booklet CNIC Inflammation Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8637607
Engine version: 0.105.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 90.37 MB
Data read: 17.62 MB (ratio 5.13:1)
Time: 19.047 sec (0 m 19 s)
Start Date: 2022:09:29 12:49:25
End Date: 2022:09:29 12:49:44
root:~# clamscan Abstract\ booklet\ CNIC\ Inflammation\ Day.pdf
Loading: 10s, ETA: 0s [========================>] 8.64M/8.64M sigs
Compiling: 3s, ETA: 0s [========================>] 41/41 tasks
/root/Abstract booklet CNIC Inflammation Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8637648
Engine version: 0.104.4
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 90.37 MB
Data read: 17.62 MB (ratio 5.13:1)
Time: 19.072 sec (0 m 19 s)
Start Date: 2022:09:29 12:56:19
End Date: 2022:09:29 12:56:38
Any news on this issue? It does not depend on the files to be scanned, anything larger takes a long time. Example: Joomla_4.2.5-Stable-Full_Package.tar.gz (size=24 M)
clamdscan -m Joomla_4.2.5-Stable-Full_Package.tar.gz /tmp/Joomla_4.2.5-Stable-Full_Package.tar.gz: OK
----------- SCAN SUMMARY ----------- Infected files: 0 Time: 120.016 sec (2 m 0 s) Start Date: 2022:12:03 10:44:21 End Date: 2022:12:03 10:46:21
Another antivirus as a comparison with the mention that it must load virus definitions before scanning (the period is included in the total scan time).
SAVScan virus detection utility Version 5.90.0 [Linux/AMD64] Virus data version 5.97, November 2022 Includes detection for 79322720 viruses, Trojans and worms Copyright (c) 1989-2022 Sophos Limited. All rights reserved.
System time 10:46:57 AM, System date 03 December 2022 Command line qualifiers are: -sc -f -di -c -b -all -rec -remove -archive -mime -oe -tnef -pua
Full Scanning
1 file scanned in 27 seconds. No viruses were discovered. No PUAs were discovered. End of Scan.
Pay attention to the number of virus definitions! Clamav -> 8815934 signatures Sophos -> 79322720 signatures
I have problem with slow scan time with big PDF files and I just found that this 2 options or settings are the most sigificant on scan time.
- PCREMatchLimit (--pcre-match-limit in command line) default value: 100000?
- PCRERecMatchLimit (--pcre-recmatch-limit in command line) default value: 5000?
Scan file is email file and I created signature from JPG image file inside attachment PDF with fuzzyimg
$ fuzzyimg /tmp/20221203_165750-3-1670055877.msg.8d83562918/3-1670055877.msg.eb940341a8/xxxx.pdf.283ff5d12e/pdf-tmp.35d594783a/pdf01
pdf01: f0e00b0fef9689cc
You can see the different scan time with different scan options adjustment
Scan with default setting
$ clamscan 5-1670056633.msg
Loading: 24s, ETA: 0s [========================>] 8.82M/8.82M sigs
Compiling: 4s, ETA: 0s [========================>] 42/42 tasks
5-1670056633.msg: Fuzzy.Spam.PDF.UNOFFICIAL FOUND
----------- SCAN SUMMARY -----------
Known viruses: 8815090
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 1.75 MB
Data read: 0.85 MB (ratio 2.06:1)
Time: 76.165 sec (1 m 16 s)
Start Date: 2022:12:03 17:56:22
End Date: 2022:12:03 17:57:38
Scan with specify default setting values
$ clamscan --pcre-recmatch-limit=5000 --pcre-match-limit=100000 5-1670056633.msg
Loading: 28s, ETA: 0s [========================>] 8.82M/8.82M sigs
Compiling: 4s, ETA: 0s [========================>] 42/42 tasks
5-1670056633.msg: Fuzzy.Spam.PDF.UNOFFICIAL FOUND
----------- SCAN SUMMARY -----------
Known viruses: 8815090
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 1.75 MB
Data read: 0.85 MB (ratio 2.06:1)
Time: 87.619 sec (1 m 27 s)
Start Date: 2022:12:03 18:01:32
End Date: 2022:12:03 18:02:36
Scan with specify half default setting values
$ clamscan --pcre-recmatch-limit=2500 --pcre-match-limit=50000 5-1670056633.msg
Loading: 26s, ETA: 0s [========================>] 8.82M/8.82M sigs
Compiling: 4s, ETA: 0s [========================>] 42/42 tasks
5-1670056633.msg: Fuzzy.Spam.PDF.UNOFFICIAL FOUND
----------- SCAN SUMMARY -----------
Known viruses: 8815090
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 1.75 MB
Data read: 0.85 MB (ratio 2.06:1)
Time: 35.717 sec (0 m 35 s)
Start Date: 2022:12:03 18:02:53
End Date: 2022:12:03 18:03:01
However I tested with above bigger PDF file (Abstract.booklet.CNIC.Inflammation.Day.pdf) I wonders that the PCREMatchLimit / PCRERecMatchLimit settings has not affected with scan time so there might be other settings or actual scan time was limited by time limit setting
LibClamAV debug: cli_unzip: Time limit reached (max: 120000) LibClamAV debug: Exceeded scan time limit while evaluating logical and yara signatures (max: 120000) LibClamAV debug: Descriptor[4]: halting after file scan because: Exceeded time limit LibClamAV debug: Descriptor[3]: halting after file scan because: Exceeded time limit
$ clamscan Abstract.booklet.CNIC.Inflammation.Day.pdf
Loading: 29s, ETA: 0s [========================>] 8.82M/8.82M sigs
Compiling: 4s, ETA: 0s [========================>] 42/42 tasks
Abstract.booklet.CNIC.Inflammation.Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8815090
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 261.20 MB
Data read: 17.62 MB (ratio 14.82:1)
Time: 156.588 sec (2 m 36 s)
Start Date: 2022:12:03 18:35:42
End Date: 2022:12:03 18:38:19
$ clamscan --pcre-recmatch-limit=2500 --pcre-match-limit=50000 Abstract.booklet.CNIC.Inflammation.Day.pdf
Loading: 24s, ETA: 0s [========================>] 8.82M/8.82M sigs
Compiling: 7s, ETA: 0s [========================>] 42/42 tasks
Abstract.booklet.CNIC.Inflammation.Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8815080
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 255.83 MB
Data read: 17.62 MB (ratio 14.52:1)
Time: 154.080 sec (2 m 34 s)
Start Date: 2022:12:03 18:32:05
End Date: 2022:12:03 18:34:39
$ clamscan --max-filesize=25M --max-scansize=100M --max-embeddedpe=10M --max-htmlnormalize=10M --max-htmlnotags=2M --max-scriptnormalize=5M --pcre-max-filesize=25M --pcre-recmatch-limit=2500 --pcre-match-limit=50000 Abstract.booklet.CNIC.Inflammation.Day.pdf
Loading: 25s, ETA: 0s [========================>] 8.82M/8.82M sigs
Compiling: 5s, ETA: 0s [========================>] 42/42 tasks
Abstract.booklet.CNIC.Inflammation.Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8815090
Engine version: 1.0.0
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 90.37 MB
Data read: 17.62 MB (ratio 5.13:1)
Time: 78.892 sec (1 m 18 s)
Start Date: 2022:12:03 19:14:01
End Date: 2022:12:03 19:15:20
The problem affects any service that uses Clamav, for example Amavis, Squid eCAP, he puts them in head without any problem. What should be done, should I go back to 0.103.7 or simply disable Clamav?
Scanning using Sophos Protection for Linux (avscanner) new replacement for Sophos Antivirus for Linux.
time avscanner -ai Abstract.booklet.CNIC.Inflammation.Day.pdf [10:41:45] Logger av configured for level: INFO
[10:41:45] Archive scanning enabled: yes [10:41:45] Image scanning enabled: yes [10:41:45] Following symlinks: no [10:41:45] Scanning /tmp/Abstract.booklet.CNIC.Inflammation.Day.pdf [10:41:45] End of Scan Summary: [10:41:45] 1 file scanned in less than a second. [10:41:45] 0 files out of 1 were infected.
real 0m0.317s user 0m0.022s sys 0m0.019s
@martin-ms Interesting. I'm not sure what to say. We do a bit of performance profiling/monitoring on a a selection of file types but I think we will have to extend that and compare older and newer versions to understand what's going on.
Any news on that?
From https://www.linuxquestions.org/ Slackware forum:
You are right that scanning of pdf's are slow- A clamscan of the 18 Mb AbstractDay.pdf took 3m 21s (the first 1m 22s for loading of databases). I could see the pdf was extracted as six 28 Mb "raw" noname files in the /tmp directory. The clamd.conf can be set to disable unpacking (but not scanning) of pdf's.
I tried again with the current version 1.0.0, but I got the same result.
Scanning directories with version 1.0.0:
----------- SCAN SUMMARY -----------
Known viruses: 8651048
Engine version: 1.0.0
Scanned directories: 8009
Scanned files: 107468
Infected files: 3
Data scanned: 26533.77 MB
Data read: 14757.64 MB (ratio 1.80:1)
Time: 6347.488 sec (105 m 47 s)
Start Date: 2023:02:04 09:24:51
End Date: 2023:02:04 11:10:38
Then the same directories with the same options and settings with version 0.104.2:
----------- SCAN SUMMARY -----------
Known viruses: 8651044
Engine version: 0.104.2
Scanned directories: 8009
Scanned files: 107548
Infected files: 3
Data scanned: 20594.58 MB
Data read: 14766.02 MB (ratio 1.39:1)
Time: 1988.306 sec (33 m 8 s)
Start Date: 2023:02:04 11:12:41
End Date: 2023:02:04 11:45:50
It's still more than three times slower. I'll stay with 0.104.2 for now, but may have to look for something else as I can't work with an outdated version forever.
Now Clamav 1.0.0 can only be reasonably used for small files (perhaps under 1 MB), is this by design? It's OK if you can use another antivirus solution, in my case because of systemd I don't have another solution for production servers (e.g. mail, proxy).
A commercial alternative that is compatible with Clamav can be IKARUS scan.server A quick start guide here.
ClamAV interface
Starting with version 1.7.0, IKARUS scan.server supports a ClamAV compatible TCP socket that mimics clamd (default TCP port: 3310). It only supports the scanning of single files and buffers. For further information regarding the use of the interface directly, please read the ClamAV documentation at https://www.clamav.net/documents/scanning#clamd .
It also works as a unix socket and version 6.1.7 includes the option to configure socket permissions (very useful).
Still no improvement to the existing problem in version 1.1.0!
To those affected, could you please provide a flamegraph showing where clamav is spending more time? This is the best way to show us what you're seeing on your system so we can figure out a fix.
Instructions here: https://docs.clamav.net/manual/Development/performance-profiling.html?highlight=flame#flame-graph-profiling
Ideally, we'd like a flamegraph of the older, more performant version and the latest so we can compare the two.
I'm sorry, but I don't have the necessary hardware to perform those operations. What I and others have noticed is that Data scanned has increased and recent versions of Clamav quickly scan only small files (under 1 MB). For any type of larger file, the scanning time is very long. I mention that I cannot use LLVM for compilation (I have the latest version 16.0.3) due to compatibility problems.
Instructions here: https://docs.clamav.net/manual/Development/performance-profiling.html?highlight=flame#flame-graph-profiling
LibClamAV Error: cl_load(): No such file or directory: clamav.hdb
I don't know where to obtain the missing file, it ist not part of the distributed installation package.
[UPDATE]
OK... found the file now in the source archive, issued
perf record -F 100 -g -- clamscan -d clamav.hdb --allmatch ./test/
and the results are for 0.104.2
and for 1.0.1
The performance was the same both times, I can't imagine what this test is good for, it doesn't bring any new insights.
I compiled perf, recompiled clamav (1.0.1) with debug symbols and made some tries.
root:~# perf script > /tmp/out.perf
dso__load_sym: failed to find program header for symbol: _etext st_value: 0x277c5
root:~# perf record -F 100 -g -- clamscan Abstract\ booklet\ CNIC\ Inflammation\ Day.pdf
Loading: 9s, ETA: 0s [========================>] 8.67M/8.67M sigs
Compiling: 2s, ETA: 0s [========================>] 41/41 tasks
/root/Abstract booklet CNIC Inflammation Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8665703
Engine version: 1.0.1
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 472.84 MB
Data read: 17.62 MB (ratio 26.83:1)
Time: 132.078 sec (2 m 12 s)
Start Date: 2023:05:06 20:16:35
End Date: 2023:05:06 20:18:47
[ perf record: Woken up 4 times to write data ]
[ perf record: Captured and wrote 0.875 MB perf.data (13206 samples) ]
root:~# perf script > /tmp/out.perf
dso__load_sym: failed to find program header for symbol: _etext st_value: 0x277c5
Don't know about that symbol error. I generated the SVG, but not sure if it is worth.
As I already told in a previous comment, I don't know how to handle the -d ./unit_tests/clamav.hdb option in the given example, so I just used
perf record -F 100 -g -- /usr/bin/clamscan -ir $HOME
as the command line with the following results:
----------- SCAN SUMMARY -----------
Known viruses: 8665707
Engine version: 0.104.2
Scanned directories: 8457
Scanned files: 145265
Infected files: 3
Data scanned: 21981.75 MB
Data read: 15944.07 MB (ratio 1.38:1)
Time: 2221.037 sec (37 m 1 s)
Start Date: 2023:05:07 13:22:45
End Date: 2023:05:07 13:59:46
[ perf record: Woken up 55 times to write data ]
[ perf record: Captured and wrote 13,965 MB perf.data (218477 samples) ]
----------- SCAN SUMMARY -----------
Known viruses: 8665703
Engine version: 1.0.1
Scanned directories: 8462
Scanned files: 145753
Infected files: 3
Data scanned: 28681.03 MB
Data read: 15988.41 MB (ratio 1.79:1)
Time: 6552.762 sec (109 m 12 s)
Start Date: 2023:05:07 14:15:39
End Date: 2023:05:07 16:04:52
[ perf record: Woken up 175 times to write data ]
[ perf record: Captured and wrote 44,184 MB perf.data (656843 samples) ]
For me it's all useless stuff & wasted time, but for those who like it...
This seems weird. Uploaded PDF takes 120 seconds in 0.105.1 with defaults. Note the scanned data 810 MB in a file of only 17 Mb in size...
I did some more tests with this PDF file and found that it seems to keep clamscan busy until one of the limits is hit, which can be shown by adding the --alert-exceeds-max=yes switch to the command line.
In 0.103.8 the default MaxFileSize gets hit pretty quickly, and that's why the scan appears to be so fast. When increasing the size limits the scan runs longer until it hits the MaxScanTime.
With newer version and their higher default size limits the MaxScanTime limit gets hit when running the engine with defaults, but when increasing the time, one of the size limits gets hit as well.
I tried this with size limits of up to 1000MB and time limits up to 200 seconds and got no regular finish of the scan process with either version.
I think the ClamAV team should have a closer look at this file to see why it is always driving ClamAV to its limits.
[Update] See my correction in the next post [/Update]
$ clamscan --alert-exceeds-max=yes Abstract.booklet.CNIC.Inflammation.Day.pdf
Loading: 12s, ETA: 0s [========================>] 8.67M/8.67M sigs
Compiling: 3s, ETA: 0s [========================>] 41/41 tasks
/root/Abstract.booklet.CNIC.Inflammation.Day.pdf: Heuristics.Limits.Exceeded.MaxScanTime FOUND
----------- SCAN SUMMARY -----------
Known viruses: 8669437
Engine version: 1.1.0
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 714.02 MB
Data read: 17.62 MB (ratio 40.52:1)
Time: 136.619 sec (2 m 16 s)
Start Date: 2023:06:21 14:37:08
End Date: 2023:06:21 14:39:25
$ clamscan --alert-exceeds-max=yes Abstract.booklet.CNIC.Inflammation.Day.pdf
/tmp/Abstract.booklet.CNIC.Inflammation.Day.pdf: Heuristics.Limits.Exceeded.MaxFileSize FOUND
----------- SCAN SUMMARY -----------
Known viruses: 8669401
Engine version: 0.103.8
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 168.30 MB
Data read: 17.62 MB (ratio 9.55:1)
Time: 25.319 sec (0 m 25 s)
Start Date: 2023:06:21 14:47:11
End Date: 2023:06:21 14:47:37
$ clamscan --alert-exceeds-max=yes --max-filesize=1000M --max-scansize=1000M Abstract.booklet.CNIC.Inflammation.Day.pdf
LibClamAV Error: pdf_find_and_extract_objs: Timeout reached in the PDF parser while extracting objects.
/tmp/Abstract.booklet.CNIC.Inflammation.Day.pdf: Heuristics.Limits.Exceeded.MaxScanTime FOUND
----------- SCAN SUMMARY -----------
Known viruses: 8669401
Engine version: 0.103.8
Scanned directories: 0
Scanned files: 1
Infected files: 1
Data scanned: 758.67 MB
Data read: 17.62 MB (ratio 43.05:1)
Time: 136.556 sec (2 m 16 s)
Start Date: 2023:06:21 14:48:24
End Date: 2023:06:21 14:50:40
I have to corrct myself regarding the assumed never-ending scan of that file. It turned out that with just a little more ressources than I had tried before, the scan does come to an end in both versions with comparable timings:
$ clamscan --alert-exceeds-max=yes --max-scantime=200000 --max-filesize=1200M --max-scansize=1200M Abstract.booklet.CNIC.Inflammation.Day.pdf
Loading: 12s, ETA: 0s [========================>] 8.67M/8.67M sigs
Compiling: 3s, ETA: 0s [========================>] 41/41 tasks
/root/Abstract.booklet.CNIC.Inflammation.Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8669437
Engine version: 1.1.0
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 1001.11 MB
Data read: 17.62 MB (ratio 56.81:1)
Time: 181.161 sec (3 m 1 s)
Start Date: 2023:06:21 15:22:59
End Date: 2023:06:21 15:26:01
$ clamscan --alert-exceeds-max=yes --max-scantime=200000 --max-filesize=1200M --max-scansize=1200M Abstract.booklet.CNIC.Inflammation.Day.pdf
/root/Abstract.booklet.CNIC.Inflammation.Day.pdf: OK
----------- SCAN SUMMARY -----------
Known viruses: 8669401
Engine version: 0.103.8
Scanned directories: 0
Scanned files: 1
Infected files: 0
Data scanned: 1001.11 MB
Data read: 17.62 MB (ratio 56.81:1)
Time: 174.856 sec (2 m 54 s)
Start Date: 2023:06:21 15:33:56
End Date: 2023:06:21 15:36:51
@martin-ms thank you for making the flamegraph. Sadly it lacks the debug symbols required to show real insight into what's going on. A debug build of clamav would be required (i.e. building with the -g CFLAG.
As @rma-x identified, it seems this particular file is very slow for both versions if you crank up the scan limits.
I'll see if I can do something similar with flamegraph and scanning the provided Abstract booklet CNIC Inflammation Day.pdf. Maybe it will shed some light on what this file is so slow to scan.