plog
plog copied to clipboard
Performance improvements
Hello,
I've tried this logger and found it pretty light and useful. However, I've found multi-threading concurrency bottleneck, so I'd like to improve this a bit. Test code sample and benchmarking result will be provided below.
- Mutex lock is not necessary when performing line formatting and conversion to UTF-8, so I have moved string manipulation out of the locked scope. This gave more concurrency level and about 3 times more performance on Windows with my test sample. Unfortunately, this change won't improve things on Linux.
- Investigating at Linux I've found that application is still running "in one thread". Profiler shows spending a lot of time for mutex lock/unlock, not for disk IO. The problem is inside pthread_mutex_lock, which has too small spinlock iterations count before entering sleep state. I've added spinlock iterations manually, so code became parallel and 3 times faster in my sample in Linux too.
- Omitting several ANSI/UTF-16 conversions and changing 'stream << char[1]' to 'stream << char' gave me another few percent of performance. This just removes redundant strlen() call on char[1]. Additionally strftime is about 30% faster on Windows than manual stringstream conversion, and it looks simpler.
Multithreading test code
void threadFunction()
{
for (int i = 0; i < 800000; ++i)
LOG(plog::debug) << "i=" << i << "clock:" << clock();
}
int main(int argc, char* argv[])
{
plog::init(plog::debug, "Hello.txt");
std::vector<std::thread> threads(4);
clock_t start = clock();
for (size_t i = 0; i < threads.size(); ++i)
threads[i] = std::thread(threadFunction);
for (std::thread& thread : threads)
thread.join();
clock_t finish = clock();
std::cout << "Time: " << finish - start << std::endl;
return 0;
}
Benchmarking results for Linux
Test system is Core i5-4300M with SSD. OS: Windows 10 host, VM: Unubtu Linux 14.04.2.
Original branch
top/iotop results: Disk write speed 4-5 MB/s, CPU 25-30% total (one core at 100%).
victor@ubuntu:~/plog$ time ./test_master_o2
Time: 106639234
real 0m58.393s
user 0m4.579s
sys 1m42.063s
Performance improvement branch
top/iotop results: Disk write speed 17-22 MB/s, CPU 100% total
victor@ubuntu:~/plog$ time ./test_perf_o2_3
Time: 39805657
real 0m11.434s
user 0m16.325s
sys 0m23.483s
Benchmarking results for Windows are very similar.
Thanks for review and feel free to ask additional questions if needed. Victor.
Interesting findings, great thanks.
AFAIK pthread_yield does not exist on MacOSX.
cd /usr/include/pthread grep -r yield * pthread.h:void pthread_yield_np(void); sched.h:extern int sched_yield(void);
@stephane-martin
Yes, the linux man page says about pthread_yield
:
This call is nonstandard, but present on several other systems. Use the standardized
sched_yield
instead
Thanks for noticing it.
I remember about performance improvements made by @victor-istomin and I also made some additional profiling and research. The results are very sad:
-
std::mutex
andpthread_mutex
implementations don't use spinning by default - C/C++ string conversion functions are very slow
I tried to use custom string conversion functions and got a near 5x performance boost!