cpython gh-99127: Port syslog module to use module state.

Issue: gh-99127

Nov 05 '22 13:11 corona10


(.oss) ➜  cpython git:(gh-99127) ✗ ./python.exe -m test test_syslog -R 3:3
Raised RLIMIT_NOFILE: 256 -> 1024
0:00:00 load avg: 3.65 Run tests sequentially
0:00:00 load avg: 3.65 [1/1] test_syslog
beginning 6 repetitions
123456
......

== Tests result: SUCCESS ==

1 test OK.

Total duration: 704 ms
Tests result: SUCCESS

Nov 05 '22 13:11 corona10

Something to consider: what happens when multiple sub interpreters use the syslog extension. The current implementation always calls openlog(3) at least once before calling syslog(3), see the check in syslog_syslog_impl.

Thanks that's why I got a headache which makes me close the issue but thanks to suggest the solution.

IMHO the check in syslog_syslog_impl is not necessary because it results in a call to openlog(3) with parameters that match the default behaviour of the syslog library.

Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right? Please let me know if I understood it wrongly.

Nov 05 '22 14:11 corona10

Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right?

Or did you intended implicit call of openlog(3) while calling syslog(3)?

Nov 05 '22 15:11 corona10

:robot: New build scheduled with the buildbot fleet by @corona10 for commit 952ff46a8c0c790a09bb21d1badd24a37d5ca7bc :robot:

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

Nov 06 '22 03:11 bedevere-bot

Failure tests from buildbot/PPC64LE Fedora were not related to this change.

Nov 06 '22 14:11 corona10

Unfortunately I don't have time to review this right now. Maybe @serhiy-storchaka can have a look? He recently did some work on this module.

Nov 06 '22 19:11 erlend-aasland

Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right? Please let me know if I understood it wrongly.

At the C level calling openlog(3) before calling syslog(3) is not necessary, although the spec is not entirely clear about this. The linux manual page does clearly mention that openlog(3) is optional, and matches what I remember from older Unix systems.

The call to openlog in syslog_syslog_impl basically resets the default system state (use the process name for the ident and log using the LOG_USER facility). Leaving that out should therefore be harmless, but this is technically a user-visible change of behaviour:

C code calls openlog(...) with same parameters
Python code calls syslog.syslog

Currently the Python code resets the values set in the first step, with my proposal it would no longer do this. I'd consider this a good change, but there's bound to be someone affected by this.

Nov 06 '22 20:11 ronaldoussoren

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

And if you don't make the requested changes, you will be poked with soft cushions!

Nov 07 '22 16:11 bedevere-bot

I've posted to the issue about interpreters interfering with each other (which is not new with this PR): https://github.com/python/cpython/issues/99127#issuecomment-1306095643.

Nov 07 '22 19:11 ericsnowcurrently

For the specific case of sub-interpreters, maybe sub-interpreters must not be allowed to call openlog() and closelog()?

But for the simple case of a single intereterpreter but loading the C extensions twice, there is a similar issue.

This problem reminds me the complicated case of the posix/nt extension being converted to multi-phase init. I got a very complicated issue about Python exit. Some applications started to crash. Oops. Python os.environ[key] = value stored a copy of "key=value" string... until Python exit. But if the application still runs after Python shutdown, reading the environment could lead to reading freed memory which leaded to a crash! See:

https://github.com/python/cpython/issues/83576 issue.
https://github.com/python/cpython/issues/83587

The fix? Let the C library handles memory for us: use setenv() function, rather than putenv()!

For me, the Python syslog module should be a thin wrapper to a glue between Python and the underlying C library: have something which handles openlog() / closelog(), keep memory around, and only frees this memory when the last interpreter is deleted. Maybe it can be a shared state with a reference counter?

syslog instance 1 calls openlog(): copy arguments in heap memory
syslog instance 2 calls openlog(): free old memory, copy new arguments
...
last syslog instance exits: call closelog(), release memory

For me, it sounds weird and dangerous to still use closelog() in another syslog instance which remains alive. Maybe it works on some operating systems. But again, see the recent crash on macOS.

Nov 07 '22 21:11 vstinner

For example, the Python socket.socket module has an _io_refs reference counter: it's used to decide when sock.close() will really close the underlying "raw" socket (C _socket.socket module).

Nov 07 '22 21:11 vstinner

Yeah, it is usually best to restrict management of global resources to the main interpreter. I'd say the syslog API counts as a global resource, so I'd be good with syslog.openlog() and syslog.closelog() only working in the main interpreter. Maybe also disallow syslog.syslog() in a subinterpreter if openlog() hasn't been called yet (or maybe not).

Nov 08 '22 00:11 ericsnowcurrently

FYI, it looks like logging.handlers.SysLogHandler doesn't use the syslog module, so that shouldn't factor in.

Nov 08 '22 00:11 ericsnowcurrently

so I'd be good with syslog.openlog() and syslog.closelog() only working in the main interpreter. Maybe also disallow syslog.syslog() in a subinterpreter if openlog() hasn't been called yet (or maybe not).

+1

Nov 10 '22 03:11 corona10

I have made the requested changes; please review again

Nov 11 '22 06:11 corona10

Thanks for making the requested changes!

@vstinner: please review the changes made to this pull request.

Nov 11 '22 06:11 bedevere-bot

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Nov 11 '22 16:11 bedevere-bot

I have made the requested changes; please review again

Nov 12 '22 11:11 corona10

Thanks for making the requested changes!

@vstinner, @ericsnowcurrently: please review the changes made to this pull request.

Nov 12 '22 11:11 bedevere-bot

@vstinner Any suggestion before merging this PR?

Nov 16 '22 04:11 corona10

This limitation seems to be new in Python 3.12: can you document this incompatible change? Also, can you please document in the Python syslog module documentation the behavior/limitations of sub-interpreters?

Oh I missed this, I will push the documentation ASAP.

Nov 17 '22 01:11 corona10

@vstinner Updated! PTAL

@ericsnowcurrently Would you like to take a look at the documentation update? I am not sure it will be natural for native speakers :)

Nov 17 '22 13:11 corona10

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Nov 17 '22 23:11 bedevere-bot

I have made the requested changes; please review again

Nov 18 '22 05:11 corona10

Thanks for making the requested changes!

@ericsnowcurrently, @vstinner: please review the changes made to this pull request.

Nov 18 '22 05:11 bedevere-bot

@ronaldoussoren:

Why can syslog.syslog only be used when the main interpreter has called syslog.openlog? That's a restriction that is not present in the underlying library.

The limitation comes from Python which stores a Python object in a global variable:

/*  only one instance, only one syslog, so globals should be ok  */
static PyObject *S_ident_o = NULL;                      /*  identifier, held by openlog()  */
static char S_log_open = 0;

We must hold a strong reference because we pass a pointer into the internal UTF-8 encoded flavor of the Unicode string, PyUnicode_AsUTF8():

    /* At this point, ident should be INCREF()ed.  openlog(3) does not
     * make a copy, and syslog(3) later uses it.  We can't garbagecollect it.
     * If NULL, just let openlog figure it out (probably using C argv[0]).
     */
    if (ident) {
        ident_str = PyUnicode_AsUTF8(ident);
        if (ident_str == NULL) {
            Py_DECREF(ident);
            return NULL;
        }
    }

I understand that openlog() doesn't copy the ident string. Maybe this assumption is wrong and the whole change is not needed. I don't know. I don't want to check the implementation of openlog() on all supported platforms, and run a real test.

S_ident_o global variable was added in 1998 by commit ae94cf292ba0317ac70a5eca3004a2fa5bcaad80. Previously, it was a static PyObject *ident_o = NULL; in openlog() added in 1995 by commit c1822a4dd1b5276211be041c7ac216c549c787a4.

Ok, I'm curious and I looked into the GNU glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=misc/syslog.c;h=f67d4b58a448cabdf63f629f72a9df6e009286cf;hb=HEAD#l295

The ident string is not copied, only the pointer is copied (LogTag = ident;). So if the pointer becomes a dangling pointer, syslog() will just crash when reading ident (LogTag)

Nov 18 '22 06:11 vstinner

One problem with sub-interpreters is that Python objects should not travel from one interpreter to another. But static PyObject *S_ident_o can become a memory copy of the UTF-8 encoded string instead, so we don't pass objects anymore.

For closelog(), I proposed earlier a counter increased by syslog.openlog() and decreased by syslog.closelog(). The C function closelog() would only be called when the counter reach zero. One problem is if two interepreters use different identifiers, the latest call to syslog.openlog() wins: it overrides the previous ident (and log level).

os.chdir() has a similar problem: it's a process-wide parameter affecting all threads and all interpreters. But it's a simpler problem. The current working directory is stored in the kernel, not in Python, and no resource have to be released explicitly at exit :-)

Nov 22 '22 07:11 vstinner

Agree, keeping a copy of ident in the raw dynamically allocated array will resolve the issue.

But I do not think we need a reference counter. Just call closelog() and clear the saved ident string. If your program calls syslog.closelog() multiple times, it perhaps has a trouble. But at least this will not cause a crash.

Nov 22 '22 15:11 serhiy-storchaka

What about keeping a static buffer and copying the str object's C string into it, then giving the buffer to openlog()? Then there's no deallocation to fuss with, we can drop S_log_open and both syslog.syslog() and syslog.closelog() become simpler. I'm pretty sure the only downside is that we would introduce a hard-limit on the length of the identifier.

#define MAX_IDENT 256
static char S_ident[MAX_IDENT+1];

...

static PyObject *
syslog_openlog_impl(...)
{
    ....
    const char *ident_str = NULL;
    if (ident != NULL) {
        Py_ssize_t size = PyUnicode_GET_LENGTH(ident);
        if (size > MAX_IDENT) {
            PyErr_SetString(...);
            return NULL;
        }
        else if (size > 0) {
            strncpy(S_ident, PyUnicode_AsUTF8(ident), size);
            ident_str = S_ident;
        }
    }
    openlog(ident_str, logopt, facility);
    Py_RETURN_NONE;
}

static PyObject *
syslog_syslog_impl(...)
{
    if (PySys_Audit("syslog.syslog", "is", priority, message) < 0) {
        ...
    }

#ifdef __APPLE__
    // gh-98178: On macOS, libc syslog() is not thread-safe
    syslog(priority, "%s", message);
#else
    Py_BEGIN_ALLOW_THREADS;
    syslog(priority, "%s", message);
    Py_END_ALLOW_THREADS;
#endif
    Py_RETURN_NONE;
}

static PyObject *
syslog_closelog_impl(PyObject *module)
{
    ...
    closelog();
    Py_RETURN_NONE;
}

There would still be a race under per-interpreter GIL, though, if subinterpreters are allowed to call syslog.openlog() and we don't have a process-global lock for the module.

Nov 22 '22 16:11 ericsnowcurrently

@ronaldoussoren @vstinner @ericsnowcurrently

As the first step for the subinterpreter project, I would like to suggest maintaining restrictions for subinterpreter as the current PR. If there are requests which require allowing all of the features from subinterpreter side, let's discuss the way to support it. Eric's suggestion might be a good way to solve it but let's delay the decision for this moment.

Nov 24 '22 15:11 corona10

cpython cpython copied to clipboard

gh-99127: Port syslog module to use module state.

cpython
cpython copied to clipboard