cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Subtle issue with borrowed references in extensions.

Open anamax22 opened this issue 1 year ago • 12 comments

I think that the discussion of borrowing in Section 1.10 makes an untrue assumption.

In specific, I think that "When you pass an object reference into another function, in general, the function borrows the reference from you" is incomplete/misleading.

More to the point, "When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns." seems wrong.

I don't have a clean documentation fix but I do have an example that demonstrates the problem.

Section 1.10.3 of https://docs.python.org/3/extending/extending.html ends with:

void
bug(PyObject *list)
{
    PyObject *item = PyList_GetItem(list, 0);
    Py_BEGIN_ALLOW_THREADS
    ...some blocking I/O call...
    Py_END_ALLOW_THREADS
    PyObject_Print(item, stdout, 0); /* BUG! */
}

It's a second example of "Could it perhaps do something to invalidate the reference to item in bug()? You bet! Assuming that the list passed into bug() is accessible to the del() method, it could execute a statement to the effect of del list[0], and assuming this was the last reference to that object, it would free the memory associated with it, thereby invalidating item." (from earlier in that section).

However, I'm reasonably certain that the following is also broken in a way that the discussion of borrowing (from the caller) doesn't address.

void
bug2(PyObject *list)
{
    Py_BEGIN_ALLOW_THREADS
    ...let other threads run during some blocking I/O call...
    Py_END_ALLOW_THREADS

    # Can the last reference to list be deleted by a thread that ran during
    # said blocking I/O call?
    # If so, this next line is likely to cause problems.
    PyObject *item = PyList_GetItem(list, 0);
}

anamax22 avatar Aug 08 '22 16:08 anamax22

The issue you sketch is a reverse scenario from the documentation you quote.

The sentence When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns. is about this:

PyObject* value = PyLong_FromLong(42); // *value* is a owned reference

SomeAPI(value);

Here SomeAPI "borrows" the reference from its caller, in that it assumes that the caller has an owning reference.

In value = PyList_GetItem(someList, 42) the value is a borrowed reference returned from an API, and that indeed has the problems you sketch: Some other location owns the reference (in this case the list object) and that reference get invalidated when you call back into the interpreter (which might resize the list or replace this item).

Note that the text, and other examples, in 1.10.3 more clearly explain the problem then I do here.

ronaldoussoren avatar Aug 08 '22 16:08 ronaldoussoren

The examples in 1.10.3 are about list's elements. They point out that while list was borrowed, its elements weren't.

My question is whether the caller's reference to list is guaranteed to survive running other threads.

If not, "the borrowed reference’s lifetime is guaranteed until the function returns" is not true.

I don't see any text in 1.10 that addresses the possibility that a borrowed reference can go away, that its lifetime actually isn't guaranteed.

anamax22 avatar Aug 09 '22 03:08 anamax22

You are correct. Functions that return borrowed references are tricky to use correctly: you need to ensure that whatever you're borrowing from stays valid, and that its reference stays valid (which might involve relying on implementation details). You don't even need threads for things to go wrong: even printing out a value can call custom __str__, which can clear a list and invalidate a borrowed reference.

Functions that return borrowed references have been available for a long time, are very widely used, and enable faster code than safer variants, so they're not going away. But we can definitely do better in documenting the issues and giving safer alternatives.

Things are different when getting a borrowed reference directly as a function argument (list, not item, in your example). There, the caller is responsible for keeping the reference valid for the entire call, and that is generally not hard to do.

encukou avatar Aug 09 '22 08:08 encukou

There seems to be some confusion. I'm not talking about functions that return borrowed references. I'm talking about a case when borrowing doesn't work.

Consider the following:

# change() is called continuously by another thread
_GLOBAL = list('first')
def change():
    global _GLOBAL
    _GLOBAL = list('new')

def caller():
    called(_GLOBAL)

while True:
    caller()

Everything is safe if called is implemented in python because initializing the parameter creates a new reference, even if the python called lets other threads run before accessing the said parameter's value.

However, if called is an extension function, its borrowed reference is probably invalid after it lets other threads run.

I think that the only safe thing for extension functions is to create a new reference to all parameters before letting other threads run.

Yes, another way to solve this problem is to implement caller as:

def caller():
    temp = _GLOBAL
    called(temp)

But, that assumes that caller knows a lot about called's implementation.

anamax22 avatar Aug 09 '22 15:08 anamax22

The safe thing is to create a new reference after calling PyList_GetItem (a function that is problematic because it returns a borrowed reference), or to instead call PySequence_GetItem (which returns a new reference). It's then necessary to decref the item when the called function no longer needs it. Creating references to all arguments is not necessary.

encukou avatar Aug 09 '22 15:08 encukou

I'll it say again - I'm not talking about the elements/components of an argument. I'm talking about the argument as a whole, which is the actual borrowed reference.

The same problem happens with floating point numbers.

_GLOBAL = 100.9
def changer():
    global _GLOBAL
    _GLOBAL = 1.1+ _GLOBAL

arrange to have "changer" called continuously by a thread.

while True:
    reader(_GLOBAL)

If "reader" is written in python, everything is okay even if "reader" lets other threads, and thus "changer", run.

If "reader" is an extension, the borrowed reference to _GLOBAL's value is invalid after "changer" is called while "reader" lets other threads run.

anamax22 avatar Aug 09 '22 16:08 anamax22

I think we all agree here: If you get a borrowed reference as the result of calling a function you need to be very careful when using that value. In general you'll have to incref before using it with other APIs, and decref afterwards (although I've seen a lot of code that doesn't do this because the C extension is only used in a context where this is not a problem).

That's explained at the start of 1.10.3, in an informal way.

Is there anything actionable to resolve this issue?

ronaldoussoren avatar Aug 10 '22 08:08 ronaldoussoren

Minimum: "The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns. Only when such a borrowed reference must be stored or passed on, it must be turned into an owned reference by calling Py_INCREF()."

to something like.

"A borrowed reference lives only as long as the caller's reference. (That reference can go away if other threads are allowed to run.) In addition, a borrowed reference must be turned into an owned reference by calling Py_INCREF() if it is stored or passed on.)"

I don't know whether it is worth including my last example.

anamax22 avatar Aug 10 '22 12:08 anamax22

The borrowed reference that's passed as an argument must not go away until the called function returns, regardless of other threads. Since the caller must hold a reference, other threads cannot delete the last reference. If the caller somehow allows other threads to invalidate its reference, it's a bug in the caller.

Perhaps the misunderstanding comes from the fact that a reference's “borrowed-ness” depends on the context? An argument is a borrowed reference in the called function, but in the caller it's not (necessarily) a borrowed reference?

encukou avatar Aug 10 '22 12:08 encukou

"borrowed" primarily means that you don't have to decref when you're done with the reference, while you do have to decref for an "owned" reference. For arguments there's the additional convention that there should be a owned reference somewhere up the call stack (which means the callee can assume the reference stays valid until it returns).

ronaldoussoren avatar Aug 10 '22 13:08 ronaldoussoren

I'm surprised by the assertion that the caller is responsible for knowing when "callee(_GLOBAL)" is safe.

If callee is written in python, it is safe.

If callee is an extension, it seems reasonable to expect its implementation to provide the same guarantee.

anamax22 avatar Aug 13 '22 20:08 anamax22

The caller doesn't need to know anything about callee. It does need to ensure that the reference that it passes to callee remains valid until callee returns.

Maybe misunderstanding about what the function would look like when "written in C"? I assume the function:

def caller():
    called(_GLOBAL)

should be written in C as something like:

PyObject* caller(PyObject* module)
{
    PyObject* global = PyObject_GetAttrString(module, "_GLOBAL");
    if (!global) return NULL;
    PyObject* result = called(global);
    Py_DECREF(global);
    if (!result) return NULL;
    Py_RETURN_NONE;
}

or do you have something else in mind?

encukou avatar Aug 15 '22 07:08 encukou

Please don't rewrite the caller. callee(1.0+35), callee(_GLOBAL), callee(local) should all work.

The caller doesn't need to know anything about callee. It does need to ensure that the reference that it passes to callee remains valid until callee returns.

The caller doesn't need to do anything to keep the reference alive if the callee is written in python, even if the callee lets other threads run.

The same should be true if the callee is written in C, and it is possible to write the callee in C such that that is true.

My point is that the documentation says something about borrowed references that suggests that the callee need not take certain precautions if it lets other threads run.

anamax22 avatar Aug 17 '22 23:08 anamax22

Please don't rewrite the caller

So, what is the C version of caller that we should be talking about? So far you only gave Python implementation, so to get a C implementation I need to rewrite it.

encukou avatar Aug 18 '22 08:08 encukou

The Python implementation of the function does not have this problem because the bytecode interpreter uses owned references as needed.

My point is that the documentation says something about borrowed references that suggests that the callee need not take certain precautions if it lets other threads run.

The documentation seems to be clear about this (if a bit informal), but I write this as someone that already knows the rules by heart.

Note that threading is a red herring here, the problem is also there in serial code that can call back into Python, e.g.

PyObject* item = PyList_GetItem(someList, 4)
PyObject_Call(someFunction, ...)
PyObject_Print(item, stdout, 0);

If someFunction can access someList (because its in a variable or attribute) it can do del someList[4] and invalidate item.

That's something you have to be very careful about when writing C extensions. As @encukou wrote earlier you can side-step the issue here by using the abstract object item (PySequene_GetItem) instead of the list-specific API because that returns an owned reference.

ronaldoussoren avatar Aug 18 '22 09:08 ronaldoussoren

[1] The caller's implementation is irrelevant because callees should work regardless of the caller. (That's true when the callee is written in python. It should also be true when the callee is an extension. To put it another way, an extension should be just as safe to call as python.) [2] I switched to a float argument because the issue has nothing to do with items or other subcomponents of the borrowed argument. The caller's reference to the borrowed argument itself can go away in two circumstances. (I mentioned threading but ronaldoussoren points out that callbacks can have the same effect.) The documentation says that it can't.

anamax22 avatar Aug 18 '22 16:08 anamax22

This problem is documented In https://docs.python.org/3/extending/extending.html#thin-ice. This section does not contain an exhaustive list of scenario's where borrowed references can cause problems, but is (IMHO) pretty clear.

To get back to your initial report: In your "bug2" example there is in general not a problem because the convention is that the caller should ensure it owns a reference (or borrows a reference from its caller as long as there is a strong owner somewhere up the call chain). That's something we cannot enforce, so yes it is possible to do something like bug2(PyList_GetItem(list_of_lists, 0)); which could end up invalidating the reference while giving up the GIL. Don't do that...

If you are worried that callers of your C function might pass in a reference they got from calling a function that returns a borrowed reference you have to incref the value before giving up the GIL or calling a C API (and decref again before returning). Note that this is often not a problem, unless you want to be absolutely bullet proof. I've written enough in-house C extensions where I didn't worry about this issue because I knew enough of the rest of the system to know that the borrowed reference wouldn't be invalidated before I was done with it.

Bulletproof manual reference counting is a chore, which is why a lot of people currently advise to use Cython to write extensions, that way you get the write extensions in a Python-like language and don't have to worry about this issue yourself.

And a final note: it is well known that functions returning borrowed references are a (too) sharp edge in the C API, and in hindsight shouldn't have been in the public API (that latter part is my personal opinion, not necessarily the opinion of the project as a whole). See also #86460.

ronaldoussoren avatar Aug 18 '22 21:08 ronaldoussoren

The discussion indicates that the issue reported it not clear.

@anamax22 can you show a problem with reference lifetimes in a running program, i.e., a c extension plus a multithreaded python program that clobbers the references to a c function call?

I am marking this as pending and it will be closed after a while if there will be no followup.

iritkatriel avatar May 12 '23 10:05 iritkatriel