glog icon indicating copy to clipboard operation
glog copied to clipboard

EnableLogCleaner() interface is not thread safe?

Open kimi20071025 opened this issue 4 years ago • 16 comments

I read the functions of EnableLogCleanerDisableLogCleaner and LogCleaner::enabled().

I find the private member enabled_ of LogCleaner is not lock guard.

Once one thread call EnableLogCleaner,and other thread is logging, It may lead to undefined behaviour?

kimi20071025 avatar Dec 07 '21 03:12 kimi20071025

Could you please point out the exact location where the problem occurs?

sergiud avatar Dec 07 '21 08:12 sergiud

Could you please point out the exact location where the problem occurs?

When one thread is logging to file, it will call the function LogFileObject::Write(). The function LogFileObject::Write() will call log_cleaner.enabled(). The log_cleaner.enabled() visits the enabled_ without lock guarded.

meanwhile other thread try to change the enabled_ by call function EnableLogCleaner or DisableLogCleaner which will write the enabled_ without lock guarded.

so may be it lead to undefined behaviour? I am just in doubt of this. May be I am wrong.

Thanks for response.

kimi20071025 avatar Dec 08 '21 00:12 kimi20071025

Is it possible that you mean a race condition and not undefined behavior? The former can occur because EnableLogCleaner and DisableLogCleaner are not atomic. However, I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

cc @aesophor

sergiud avatar Dec 09 '21 09:12 sergiud

Is it possible that you mean a race condition and not undefined behavior? The former can occur because EnableLogCleaner and DisableLogCleaner are not atomic. However, I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

cc @aesophor

Yes. One thread is reading enabled_ while another is writing it. May the race condition leads to undefined behavior ?

It must not leads to undefined behavior because enabled_ is bool ?

kimi20071025 avatar Dec 10 '21 07:12 kimi20071025

I'm not sure what are you asking. Just because a boolean is not atomic it will not immediately cause undefined behavior. Also, undefined behavior is generally not prescribed by a library but by the language standard.

sergiud avatar Dec 10 '21 09:12 sergiud

I'm not sure what are you asking. Just because a boolean is not atomic it will not immediately cause undefined behavior. Also, undefined behavior is generally not prescribed by a library but by the language standard.

OK. If the enabled_ is uint8_t, the race condition probably cause undefined behavior?

kimi20071025 avatar Dec 13 '21 03:12 kimi20071025

Quoting from @sergiud:

I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

@kimi20071025 We understand the possible race condition you've mentioned. Are you working on something that can trigger this race condition and is intolerable?

IMO, if we want to eliminate race condition completely, just making enabled_ an std::atomic is probably not enough: Imaging two threads calling LogCleaner::Enable(int days), the value of LogCleaner::overdue_days_ at the end can be indeterministic. However, if we use a mutex to protect the entire body of both LogCleaner::Enable() and LogCleaner::enabled(), then they will bring overhead for those who use this API with a single thread.

In this case, is there a better option other than a mutex?

aesophor avatar Dec 13 '21 04:12 aesophor

Quoting from @sergiud:

I believe the race condition is tolerable because at most what can happen are missed cleanups which are made up for once EnableLogCleaner is called again.

@kimi20071025 We understand the possible race condition you've mentioned. Are you working on something that can trigger this race condition and is intolerable?

IMO, if we want to eliminate race condition completely, just making enabled_ an std::atomic is probably not enough: Imaging two threads calling LogCleaner::Enable(int days), the value of LogCleaner::overdue_days_ at the end can be indeterministic. However, if we use a mutex to protect the entire body of both LogCleaner::Enable() and LogCleaner::enabled(), then they will bring overhead for those who use this API with a single thread.

In this case, is there a better option other than a mutex?

What I am worried about is not race condition but data race.

Glog library try to clean overdue logs every time when logging. I think it may cost much performance If LogCleaner is opened all the time. So Once a day, I enable log cleaner (EnableLogCleaner ) at night , do an error logging and close log cleaner (DisableLogCleaner ).

When a thread is logging, it will finally call LogFileObject::Write, the function LogFileObject::Write calls log_cleaner.enabled().
which means the thread reads enabled_ .

Meanwhile, other thread calls EnableLogCleaner or DisableLogCleaner , the thread writes enabled_ .

So what I worried is the condition may cause data race .

kimi20071025 avatar Dec 13 '21 07:12 kimi20071025

I had to actually look this up. But it seems that concurrent reading/writing of/to a variable is indeed undefined behavior in C++.

However, at this point I would rather claim that your use case is not supported in the sense that you should not be calling EnableLogCleaner and DisableLogCleaner from different threads. As @aesophor already mentioned, a mutex is necessary to solve the data race which will however incur an overhead at little or no gain. We can certainly reconsider once we have the means for proper synchronization (i.e., C++11 <thread> functionality). But at this point, the effort making these functions thread safe is not worth it.

I think it may cost much performance

How did you determine the performance penalty, or is this only a guess? If it's the latter, you should measure quantitatively first and let us know how strong the performance degradation is if the cleaner has been enabled. This is something worth improving.

sergiud avatar Dec 13 '21 10:12 sergiud

About the log cleaner performance, I shared the same opnion with issue #702 . Much performance cost when enable log clean.

And I also read the benchmark #761 , could @aesophor help to make other benchmark comparing the logging performance between enable log clean with disable log clean?

If there is much performance penalty comparing by disable log clean. It means the LogCleaner can not be enabled all the time.

kimi20071025 avatar Dec 15 '21 06:12 kimi20071025

Platform:

  • Arch Linux
  • CPU: Intel i7-10700 CPU @ 2.90GH (16-core)
  • Memory: 16 GB ...

When there are 100 files in log directories, the execution time of LogCleaner::Run() is 0.157 ms in avg.

Upon each call to LogFileObject::write(), if the log cleaner is enabled and if there are 100 files to check, it only takes extra 0.157 ms to run.

aesophor avatar Dec 15 '21 07:12 aesophor

About the log cleaner performance, I shared the same opnion with issue #702 . Much performance cost when enable log clean.

What's exactly "much performance" in absolute numbers? Issue #702 did not provide any details either. Therefore I don't see any factual arguments.

sergiud avatar Dec 15 '21 07:12 sergiud

Platform

  • Arch:linux
  • CPU:Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz(4 cores)
  • Memory: 4 GB

Test code

#include "glog/logging.h"
#include <unistd.h>
#include <chrono>
#include <iostream>

using namespace std;
using namespace chrono;

int main(int argc, char *argv[])
{
    google::InitGoogleLogging("");    
    FLAGS_logbufsecs = 0;
    FLAGS_max_log_size = 100;

    google::SetLogDestination(google::GLOG_ERROR, "/tmp/glog/ERROR_");
    google::SetLogDestination(google::GLOG_WARNING, "");
    google::SetLogDestination(google::GLOG_INFO, "");

    FLAGS_colorlogtostderr = false;
    FLAGS_alsologtostderr = false;
    google::SetStderrLogging(google::GLOG_FATAL);


    google::EnableLogCleaner(1);

    auto enableLogCleanStart = system_clock::now();

    for (uint16_t i = 0; i < 50000; i++)
    {
        LOG(ERROR)<<"glog error";
    }
    
    auto enableLogCleanEnd = system_clock::now();

    std::chrono::duration<double, std::milli> enableLogCleanDuration = enableLogCleanEnd - enableLogCleanStart;

    cout<<"cost: "<<enableLogCleanDuration.count()<<" ms when enable  log clean"<<endl;


    google::DisableLogCleaner();

    auto disableLogCleanStart = system_clock::now();

    for (uint16_t i = 0; i < 50000; i++)
    {
        LOG(ERROR)<<"glog error";
    }

    auto disableLogCleanEnd = system_clock::now();

    std::chrono::duration<double, std::milli> disableLogCleanDuration = disableLogCleanEnd - disableLogCleanStart;

    cout<<"cost: "<<disableLogCleanDuration.count()<<" ms when disable log clean"<<endl;

    return 0;
}

Result

cost: 518.541 ms when enable  log clean
cost: 225.845 ms when disable log clean

conclusion

When there are 1 file in log directories and log cleaner is opened, the log performance will decrease 230% comparing with log cleaner is closed.

kimi20071025 avatar Dec 16 '21 02:12 kimi20071025

I believe that the overhead (518.541 ms) is because:

  • you write 50,000 times, each time triggers the log cleaner.
  • each time the log cleaner takes approx 0.01037 ms to run.
  • 518.541 ms / 50000 = 0.01037 ms

If your application needs to write so frequently but it's performance critical at the same time, then you can use another thread to occasionally perform cleanup. You are also welcome to submit a PR and improve it.

aesophor avatar Dec 16 '21 03:12 aesophor

A possible optimization strategy: use CycleClock_Now() to maintain an additional timer such as next_cleanup_time_ .

aesophor avatar Dec 16 '21 04:12 aesophor

you can use another thread to occasionally perform cleanup

This is what I already do, But the date race of enabled_ should be avoid as I said at the begin of this issue.

kimi20071025 avatar Dec 16 '21 09:12 kimi20071025