plog icon indicating copy to clipboard operation
plog copied to clipboard

Performance improvements

Open victor-istomin opened this issue 9 years ago • 4 comments

Hello,

I've tried this logger and found it pretty light and useful. However, I've found multi-threading concurrency bottleneck, so I'd like to improve this a bit. Test code sample and benchmarking result will be provided below.

  1. Mutex lock is not necessary when performing line formatting and conversion to UTF-8, so I have moved string manipulation out of the locked scope. This gave more concurrency level and about 3 times more performance on Windows with my test sample. Unfortunately, this change won't improve things on Linux.
  2. Investigating at Linux I've found that application is still running "in one thread". Profiler shows spending a lot of time for mutex lock/unlock, not for disk IO. The problem is inside pthread_mutex_lock, which has too small spinlock iterations count before entering sleep state. I've added spinlock iterations manually, so code became parallel and 3 times faster in my sample in Linux too.
  3. Omitting several ANSI/UTF-16 conversions and changing 'stream << char[1]' to 'stream << char' gave me another few percent of performance. This just removes redundant strlen() call on char[1]. Additionally strftime is about 30% faster on Windows than manual stringstream conversion, and it looks simpler.

Multithreading test code

    void threadFunction()
    {
        for (int i = 0; i < 800000; ++i)
            LOG(plog::debug) << "i=" << i << "clock:" << clock();
    }

    int main(int argc, char* argv[])
    {
        plog::init(plog::debug, "Hello.txt");
        std::vector<std::thread> threads(4);

        clock_t start = clock();

        for (size_t i = 0; i < threads.size(); ++i)
            threads[i] = std::thread(threadFunction);

        for (std::thread& thread : threads)
            thread.join();

        clock_t finish = clock();

        std::cout << "Time: " << finish - start << std::endl;

        return 0;
    }

Benchmarking results for Linux

Test system is Core i5-4300M with SSD. OS: Windows 10 host, VM: Unubtu Linux 14.04.2.

Original branch

top/iotop results: Disk write speed 4-5 MB/s, CPU 25-30% total (one core at 100%).

  victor@ubuntu:~/plog$ time ./test_master_o2
  Time: 106639234

  real  0m58.393s
  user  0m4.579s
  sys   1m42.063s

Performance improvement branch

top/iotop results: Disk write speed 17-22 MB/s, CPU 100% total

  victor@ubuntu:~/plog$ time ./test_perf_o2_3
  Time: 39805657

  real  0m11.434s
  user  0m16.325s
  sys   0m23.483s

Benchmarking results for Windows are very similar.

Thanks for review and feel free to ask additional questions if needed. Victor.

victor-istomin avatar May 29 '15 02:05 victor-istomin

Interesting findings, great thanks.

SergiusTheBest avatar Jun 03 '15 10:06 SergiusTheBest

AFAIK pthread_yield does not exist on MacOSX.

cd /usr/include/pthread grep -r yield * pthread.h:void pthread_yield_np(void); sched.h:extern int sched_yield(void);

stephane-martin avatar Mar 14 '16 02:03 stephane-martin

@stephane-martin

Yes, the linux man page says about pthread_yield:

This call is nonstandard, but present on several other systems. Use the standardized sched_yield instead

Thanks for noticing it.

SergiusTheBest avatar Mar 14 '16 09:03 SergiusTheBest

I remember about performance improvements made by @victor-istomin and I also made some additional profiling and research. The results are very sad:

  • std::mutex and pthread_mutex implementations don't use spinning by default
  • C/C++ string conversion functions are very slow

I tried to use custom string conversion functions and got a near 5x performance boost!

SergiusTheBest avatar Mar 14 '16 10:03 SergiusTheBest