cpython
cpython copied to clipboard
gh-99127: Port syslog module to use module state.
- Issue: gh-99127
(.oss) ➜ cpython git:(gh-99127) ✗ ./python.exe -m test test_syslog -R 3:3
Raised RLIMIT_NOFILE: 256 -> 1024
0:00:00 load avg: 3.65 Run tests sequentially
0:00:00 load avg: 3.65 [1/1] test_syslog
beginning 6 repetitions
123456
......
== Tests result: SUCCESS ==
1 test OK.
Total duration: 704 ms
Tests result: SUCCESS
Something to consider: what happens when multiple sub interpreters use the syslog extension. The current implementation always calls openlog(3) at least once before calling syslog(3), see the check in syslog_syslog_impl.
Thanks that's why I got a headache which makes me close the issue but thanks to suggest the solution.
IMHO the check in syslog_syslog_impl is not necessary because it results in a call to openlog(3) with parameters that match the default behaviour of the syslog library.
Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right? Please let me know if I understood it wrongly.
Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right?
Or did you intended implicit call of openlog(3) while calling syslog(3)?
:robot: New build scheduled with the buildbot fleet by @corona10 for commit 952ff46a8c0c790a09bb21d1badd24a37d5ca7bc :robot:
If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.
Failure tests from buildbot/PPC64LE Fedora were not related to this change.
Unfortunately I don't have time to review this right now. Maybe @serhiy-storchaka can have a look? He recently did some work on this module.
Yeah, IIUC we should call openlog(3) anyway while calling syslog.syslog right? Please let me know if I understood it wrongly.
At the C level calling openlog(3) before calling syslog(3) is not necessary, although the spec is not entirely clear about this. The linux manual page does clearly mention that openlog(3) is optional, and matches what I remember from older Unix systems.
The call to openlog in syslog_syslog_impl basically resets the default system state (use the process name for the ident and log using the LOG_USER facility). Leaving that out should therefore be harmless, but this is technically a user-visible change of behaviour:
- C code calls openlog(...) with same parameters
- Python code calls syslog.syslog
Currently the Python code resets the values set in the first step, with my proposal it would no longer do this. I'd consider this a good change, but there's bound to be someone affected by this.
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again
.
And if you don't make the requested changes, you will be poked with soft cushions!
I've posted to the issue about interpreters interfering with each other (which is not new with this PR): https://github.com/python/cpython/issues/99127#issuecomment-1306095643.
For the specific case of sub-interpreters, maybe sub-interpreters must not be allowed to call openlog() and closelog()?
But for the simple case of a single intereterpreter but loading the C extensions twice, there is a similar issue.
This problem reminds me the complicated case of the posix/nt extension being converted to multi-phase init. I got a very complicated issue about Python exit. Some applications started to crash. Oops. Python os.environ[key] = value stored a copy of "key=value" string... until Python exit. But if the application still runs after Python shutdown, reading the environment could lead to reading freed memory which leaded to a crash! See:
- https://github.com/python/cpython/issues/83576 issue.
- https://github.com/python/cpython/issues/83587
The fix? Let the C library handles memory for us: use setenv() function, rather than putenv()!
For me, the Python syslog module should be a thin wrapper to a glue between Python and the underlying C library: have something which handles openlog() / closelog(), keep memory around, and only frees this memory when the last interpreter is deleted. Maybe it can be a shared state with a reference counter?
- syslog instance 1 calls openlog(): copy arguments in heap memory
- syslog instance 2 calls openlog(): free old memory, copy new arguments
- ...
- last syslog instance exits: call closelog(), release memory
For me, it sounds weird and dangerous to still use closelog() in another syslog instance which remains alive. Maybe it works on some operating systems. But again, see the recent crash on macOS.
For example, the Python socket.socket
module has an _io_refs
reference counter: it's used to decide when sock.close()
will really close the underlying "raw" socket (C _socket.socket
module).
Yeah, it is usually best to restrict management of global resources to the main interpreter. I'd say the syslog API counts as a global resource, so I'd be good with syslog.openlog()
and syslog.closelog()
only working in the main interpreter. Maybe also disallow syslog.syslog()
in a subinterpreter if openlog()
hasn't been called yet (or maybe not).
FYI, it looks like logging.handlers.SysLogHandler
doesn't use the syslog module, so that shouldn't factor in.
so I'd be good with syslog.openlog() and syslog.closelog() only working in the main interpreter. Maybe also disallow syslog.syslog() in a subinterpreter if openlog() hasn't been called yet (or maybe not).
+1
I have made the requested changes; please review again
Thanks for making the requested changes!
@vstinner: please review the changes made to this pull request.
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again
.
I have made the requested changes; please review again
Thanks for making the requested changes!
@vstinner, @ericsnowcurrently: please review the changes made to this pull request.
@vstinner Any suggestion before merging this PR?
This limitation seems to be new in Python 3.12: can you document this incompatible change? Also, can you please document in the Python syslog module documentation the behavior/limitations of sub-interpreters?
Oh I missed this, I will push the documentation ASAP.
@vstinner Updated! PTAL
@ericsnowcurrently Would you like to take a look at the documentation update? I am not sure it will be natural for native speakers :)
When you're done making the requested changes, leave the comment: I have made the requested changes; please review again
.
I have made the requested changes; please review again
Thanks for making the requested changes!
@ericsnowcurrently, @vstinner: please review the changes made to this pull request.
@ronaldoussoren:
Why can syslog.syslog only be used when the main interpreter has called syslog.openlog? That's a restriction that is not present in the underlying library.
The limitation comes from Python which stores a Python object in a global variable:
/* only one instance, only one syslog, so globals should be ok */
static PyObject *S_ident_o = NULL; /* identifier, held by openlog() */
static char S_log_open = 0;
We must hold a strong reference because we pass a pointer into the internal UTF-8 encoded flavor of the Unicode string, PyUnicode_AsUTF8():
/* At this point, ident should be INCREF()ed. openlog(3) does not
* make a copy, and syslog(3) later uses it. We can't garbagecollect it.
* If NULL, just let openlog figure it out (probably using C argv[0]).
*/
if (ident) {
ident_str = PyUnicode_AsUTF8(ident);
if (ident_str == NULL) {
Py_DECREF(ident);
return NULL;
}
}
I understand that openlog() doesn't copy the ident string. Maybe this assumption is wrong and the whole change is not needed. I don't know. I don't want to check the implementation of openlog() on all supported platforms, and run a real test.
S_ident_o
global variable was added in 1998 by commit ae94cf292ba0317ac70a5eca3004a2fa5bcaad80. Previously, it was a static PyObject *ident_o = NULL;
in openlog() added in 1995 by commit c1822a4dd1b5276211be041c7ac216c549c787a4.
Ok, I'm curious and I looked into the GNU glibc: https://sourceware.org/git/?p=glibc.git;a=blob;f=misc/syslog.c;h=f67d4b58a448cabdf63f629f72a9df6e009286cf;hb=HEAD#l295
The ident string is not copied, only the pointer is copied (LogTag = ident;
). So if the pointer becomes a dangling pointer, syslog() will just crash when reading ident (LogTag
)
One problem with sub-interpreters is that Python objects should not travel from one interpreter to another. But static PyObject *S_ident_o
can become a memory copy of the UTF-8 encoded string instead, so we don't pass objects anymore.
For closelog(), I proposed earlier a counter increased by syslog.openlog() and decreased by syslog.closelog(). The C function closelog() would only be called when the counter reach zero. One problem is if two interepreters use different identifiers, the latest call to syslog.openlog() wins: it overrides the previous ident (and log level).
os.chdir() has a similar problem: it's a process-wide parameter affecting all threads and all interpreters. But it's a simpler problem. The current working directory is stored in the kernel, not in Python, and no resource have to be released explicitly at exit :-)
Agree, keeping a copy of ident in the raw dynamically allocated array will resolve the issue.
But I do not think we need a reference counter. Just call closelog()
and clear the saved ident string. If your program calls syslog.closelog()
multiple times, it perhaps has a trouble. But at least this will not cause a crash.
What about keeping a static buffer and copying the str object's C string into it, then giving the buffer to openlog()
? Then there's no deallocation to fuss with, we can drop S_log_open
and both syslog.syslog()
and syslog.closelog()
become simpler. I'm pretty sure the only downside is that we would introduce a hard-limit on the length of the identifier.
#define MAX_IDENT 256
static char S_ident[MAX_IDENT+1];
...
static PyObject *
syslog_openlog_impl(...)
{
....
const char *ident_str = NULL;
if (ident != NULL) {
Py_ssize_t size = PyUnicode_GET_LENGTH(ident);
if (size > MAX_IDENT) {
PyErr_SetString(...);
return NULL;
}
else if (size > 0) {
strncpy(S_ident, PyUnicode_AsUTF8(ident), size);
ident_str = S_ident;
}
}
openlog(ident_str, logopt, facility);
Py_RETURN_NONE;
}
static PyObject *
syslog_syslog_impl(...)
{
if (PySys_Audit("syslog.syslog", "is", priority, message) < 0) {
...
}
#ifdef __APPLE__
// gh-98178: On macOS, libc syslog() is not thread-safe
syslog(priority, "%s", message);
#else
Py_BEGIN_ALLOW_THREADS;
syslog(priority, "%s", message);
Py_END_ALLOW_THREADS;
#endif
Py_RETURN_NONE;
}
static PyObject *
syslog_closelog_impl(PyObject *module)
{
...
closelog();
Py_RETURN_NONE;
}
There would still be a race under per-interpreter GIL, though, if subinterpreters are allowed to call syslog.openlog()
and we don't have a process-global lock for the module.
@ronaldoussoren @vstinner @ericsnowcurrently
As the first step for the subinterpreter project, I would like to suggest maintaining restrictions for subinterpreter as the current PR. If there are requests which require allowing all of the features from subinterpreter side, let's discuss the way to support it. Eric's suggestion might be a good way to solve it but let's delay the decision for this moment.