squid icon indicating copy to clipboard operation
squid copied to clipboard

Update ipcache_size and cache_log documentation

Open yigonghu opened this issue 3 years ago • 11 comments

The squid website's description of ipcache_size configuration is rather outdated. It says that

The ipcache_size defines the maximum number of DNS IP cache entries

The description doesn't mention the performance impact of tunning the ipcache_size. While increasing the ipcache_size would increase the memory usage, each entry takes really small amounts of memory usage but can avoid DNS lookup. Since the amount of available main memory is enormous these days, it seems to me that we can increase the default value in most scenarios.

Detail: I run squid 4.1 on Ubuntu 18.04 as a proxy to visit other web server. The result shows that by increasing the ipcache_size to larger than the averge latency is about 54.458ms. If the ipcache_size is smaller than the total number of dns name, the latency is about 71.428ms. Thus, there is a 31% performance benefit by increasing the ipcache_size.

The squid website's description about cache_log mentions that :

The cache_log is squid administrative logging file. This is where general information about Squid behavior goes. You can increase the amount of data logged to this file and how often it is rotated with "debug_options".

The description doesn't clarify the performance impact of cache_log. While it is nice to cache the log information, cache_log would add extra I/O operation and thus would cause performance problem. Therefore, I think it would be great to elaborate the description about the preformance impact.

Detail: I run squid 4.1 on Ubuntu 18.04 as a proxy to visit other web server. I test the cache_log with a heavy benchmarking. The result shows that the average latency is 51ms when enabling the cache_log and the average latency when disabling cache_log is about 43ms. Thus, disabling cache_log would gain about 24% performance benefit.

yigonghu avatar Sep 19 '20 18:09 yigonghu

Can one of the admins verify this patch?

squid-prbot avatar Sep 19 '20 18:09 squid-prbot

The squid website's description of ipcache_size configuration is rather outdated. It says that

The description is up to date. This directive has not changed since last edit.

The squid website's description about cache_log mentions that :

The cache_log is squid administrative logging file. This is where general information about Squid behavior goes. You can increase the amount of data logged to this file and how often it is rotated with "debug_options".

The description doesn't clarify the performance impact of cache_log. While it is nice to cache the log information, cache_log would add extra I/O operation and thus would cause performance problem. Therefore, I think it would be great to elaborate the description about the preformance impact.

NOTE: cache_log is not a cache. It is the log for Squid.

Detail: I run squid 4.1 on Ubuntu 18.04 as a proxy to visit other web server. I test the cache_log with a heavy benchmarking. The result shows that the average latency is 51ms when enabling the cache_log and the average latency when disabling cache_log is about 43ms. Thus, disabling cache_log would gain about 24% performance benefit.

What you have tested here is not Squid performance with and without a log. You have tested the I/O speed of your disk drive (writing to a file) vs memory (writing to /dev/null).

yadij avatar Sep 19 '20 18:09 yadij

The result shows that the average latency is 51ms when enabling the cache_log and the average latency when disabling cache_log is about 43ms.

@gongxini, what debug_options did you use for that performance test?

Amos: What you have tested here is not Squid performance with and without a log. You have tested the I/O speed of your disk drive (writing to a file) vs memory (writing to /dev/null).

Well, the two things are not mutually exclusive. A performance of a given Squid instance includes the performance of the disk it is using (if it is using a disk).

Testing Squid performance with debugging enabled is, technically, a valid performance test. We just should not overstate its significance -- most production Squids should run with debug_options ALL,1 (i.e. default), of course. With that setting, redirecting cache_log to /dev/null should have no performance effect on a properly operating Squid. If it does have such an effect, there may be a problem somewhere.

rousskov avatar Sep 19 '20 22:09 rousskov

The result shows that the average latency is 51ms when enabling the cache_log and the average latency when disabling cache_log is about 43ms.

@gongxini, what debug_options did you use for that performance test?

I test with the debug_option as 6 which the wiki says to store more detail of debug information.

Amos: What you have tested here is not Squid performance with and without a log. You have tested the I/O speed of your disk drive (writing to a file) vs memory (writing to /dev/null).

Well, the two things are not mutually exclusive. A performance of a given Squid instance includes the performance of the disk it is using (if it is using a disk).

Testing Squid performance with debugging enabled is, technically, a valid performance test. We just should not overstate its significance -- most production Squids should run with debug_options ALL,1 (i.e. default), of course. With that setting, redirecting cache_log to /dev/null should have no performance effect on a properly operating Squid. If it does have such an effect, there may be a problem somewhere.

I agree that we should use debug level 1 in production but it would be nice to mention the performance impact of other debug level in the document since it seems unclear to me how many overhead should I pay to increase the debug_option level.

yigonghu avatar Sep 20 '20 15:09 yigonghu

The squid website's description of ipcache_size configuration is rather outdated. It says that

The description is up to date. This directive has not changed since last edit.

Sorry, my word is a little bit confused. What I mean is that the document doesn't mention the performance impact of changing the configuration

The squid website's description about cache_log mentions that :

The cache_log is squid administrative logging file. This is where general information about Squid behavior goes. You can increase the amount of data logged to this file and how often it is rotated with "debug_options".

The description doesn't clarify the performance impact of cache_log. While it is nice to cache the log information, cache_log would add extra I/O operation and thus would cause performance problem. Therefore, I think it would be great to elaborate the description about the preformance impact.

NOTE: cache_log is not a cache. It is the log for Squid.

Sorry, it is a typo.

Detail: I run squid 4.1 on Ubuntu 18.04 as a proxy to visit other web server. I test the cache_log with a heavy benchmarking. The result shows that the average latency is 51ms when enabling the cache_log and the average latency when disabling cache_log is about 43ms. Thus, disabling cache_log would gain about 24% performance benefit.

What you have tested here is not Squid performance with and without a log. You have tested the I/O speed of your disk drive (writing to a file) vs memory (writing to /dev/null).

yigonghu avatar Sep 20 '20 15:09 yigonghu

I test with the debug_option as 6

By design, production Squids should not run with debugging levels higher than 1. Higher levels slow Squid down a lot, especially when logging to a disk. Such elevated debugging levels are only meant for problem triage. Unless you are interested in measuring performance of the debugging code, avoid setting debug_options to anything other than ALL,1 (the default) in your performance tests.

While I believe that the principles sketched above apply to most programs and are fairly common knowledge among sysadmins, I would not be violently against adding it to squid.conf.documented because I have seen quite a few sysadmins running with debugging needlessly turned on.

rousskov avatar Sep 20 '20 17:09 rousskov

it seems unclear to me how many overhead should I pay to increase the debug_option level.

In general, there is no need to pay anything: Level-2+ debugging information is useless (and is harmful) in regular production runs. A (properly configured) access log should have the transaction details you may need.

If you can get high-quality estimates of performance overheads of various debugging levels, we should publish that information on Squid wiki. Personally, I do not know how much performance overhead each level introduces. IIRC, most high-performance tests in our lab do not run well with levels above 2 or 3, even when logging to a memory-based partition. YMMV.

rousskov avatar Sep 20 '20 17:09 rousskov

it seems unclear to me how many overhead should I pay to increase the debug_option level.

In general, there is no need to pay anything: Level-2+ debugging information is useless (and is harmful) in regular production runs. A (properly configured) access log should have the transaction details you may need.

If you can get high-quality estimates of performance overheads of various debugging levels, we should publish that information on Squid wiki. Personally, I do not know how much performance overhead each level introduces. IIRC, most high-performance tests in our lab do not run well with levels above 2 or 3, even when logging to a memory-based partition. YMMV.

Thank you for the explanation. I can run a comprehensive performance test on each debug level and provide the information. Sorry for the late reply, I am very busy this weekday.

yigonghu avatar Sep 26 '20 15:09 yigonghu

I can run a comprehensive performance test on each debug level and provide the information.

If you want to do it and can do it well, then please do it (including publishing the information on the wiki) -- high-quality information in this area is occasionally useful for Squid admins.

rousskov avatar Sep 26 '20 16:09 rousskov

@gongxini, this PR is waiting on your decision whether you want to update the documentation with different texts in light of our comments or to close.

yadij avatar Oct 17 '20 10:10 yadij

@gongxini, this PR is waiting on your decision whether you want to update the documentation with different texts in light of our comments or to close.

Hi Amos, I did some experiments about the performance overhead for different option level and want to share on squid wiki page. I sent an email to ask for a squid wiki account last Friday but still wait for it. I think I can update the experiment information on wiki and close this one for one. Thanks for reminding me!

yigonghu avatar Oct 17 '20 11:10 yigonghu