python-libusb1 `ValueError` in a finalizer on PyPy

When running code that works fine on CPython under PyPy, I'm getting a ValueError raised here:

    def __registerFinalizer(self, handle, finalizer):
        if handle in self.__finalizer_dict:
            finalizer.detach()
            raise ValueError
        self.__finalizer_dict[handle] = finalizer

Is this just a remnant on some debug code that happens to work by chance on CPython, or is there an actual reason to raise it? I'm very confused.

I'm using python-libusb1 3.1.0.

Apr 15 '24 21:04 whitequark

This is not a remnant, but a way to cleanly fail instead of having the finalizer randomly break (ex: on interpreter shutdown). __finalizer_dict exists ~~both to enforce the finalization order between instances (basically: the USB context object must be kept alive for any object obtained from it can be finalized) and~~ to allow the "owner" instance to cascade closure requests to its dependents (closing a device ~~handle~~ closes all transfers related to that device).

So if two finalizers fall into the same entry in __finalizer_dict, it is a bug, and this raise is here to alert about this before this becomes the usual nightmare of garbage-collection-related heisenbugs, which themselves could turn into segfaults when libusb calls happen in the wrong order.

Could you post more of the traceback which leads to this failure ? Especially, I wonder which class' __registerFinalizer this is about, and what the caller is so I can figure out what the handle is computed from and hence guess why it may be reused.

Also, do you have a suggestion to clarify the intent of those raises, so they do not look like they may be leftover debugging code ?

EDIT: finalization order happens because references to the required objects are held by the finalizer itself, not by __finalizer_dict.

Apr 15 '24 23:04 vpelletier

Especially, I wonder which class' __registerFinalizer this is about, and what the caller is so I can figure out what the handle is computed from and hence guess why it may be reused.

That was USBDeviceHandle. Probably I got some other exception and it's crashed in the finalizer instead of whichever other way it was going to...

Apr 16 '24 02:04 whitequark

Is there a way for me to try to replicate this issue ? I unfortunately have very little time these days (which I would otherwise spend trying to get the new python-libusb1 version out, master has several improvements which were long overdue) but maybe I can figure something out.

At least, I did a quick try of the python-functionfs usbcat example (running device.py with CPython3 and host.py with pypy3 7.3.14), using kernel's dummy_hcd, sent a few packets back and forth and did not get any obvious failure.

In case you would like to reproduce, it goes like this:

Terminal 1, in python-functionfs:

$ sudo modprobe dummy_hcd
$ sudo modprobe libcomposite
$ virtualenv vpy3
$ vpy3/bin/pip install .
$ sudo vpy3/bin/python examples/usbcat/device.py

Terminal 2, in python-libusb1:

$ virtualenv -p pypy3 vpypy3
$ vpypy3/bin/pip install .
$ sudo vpypy3/bin/python .../examples/usbcat/host.py

But this may be too simple of an example to have a chance of triggering the issue.

Apr 16 '24 10:04 vpelletier

Is there a way for me to try to replicate this issue ?

Unfortunately the way I discovered it is by running an userspace RGMII PHY driver with Glasgow on PyPy3.9.

Apr 16 '24 19:04 whitequark

on PyPy3.9

This may be the reason: I use object ids as keys, and this is where you triggered a duplicate value conflict. IIRC object lifecycles in pypy are subtly different from cpython, and I would not be too surprised if this affected finalizer call order.

Here are some things I can think of:

Adding randomness to the value used to keep track of the finalizers. This feels uggly though.
Triggering a GC when conflict happens, and re-checking for conflict ? It should trigger infrequently, but may get in the way of higher-level code (ex: maybe the caller is trying to debug a memory leak and disabled the GC on purpose).

No better idea so far.

Dec 28 '24 12:12 vpelletier

Can you assign a property to an object containing a sequential numeric id, or will that get destroyed before the finalizer is called?

Dec 29 '24 10:12 whitequark

Excellent idea. This is not only much simpler, but it showed me a way to simplify the entire thing.

I ended up using a single itertools.count, stored as a class attribute: dab4906eac9ad61613e21b90ce7279204efa33ab

This should resolve this issue, not only because of the removal of the reliance on object id, but also because I removed the raise statements (because if itertools.count produces duplicate values, I'm not sure anything meaningful can be done).

How does that look to you ?

Dec 29 '24 16:12 vpelletier

As I understand, you are using itertools.count() to handle concurrency, i.e. you assume that calls to next(itertools.count()) are atomic. I don't believe this is guaranteed by anything in the language and it would be perfectly fine for someone to create a Python runtime where it's not atomic. In general you should be using a mutex.

However, I have examined the current implementations in CPython and PyPy and found them to provide atomicity. The PyPy one is written in "interpreter level" code; as far as I understand, "interpreter level" Python code in PyPy is atomic in the same way C code is atomic in CPython, aside from explicit yield points.

Personally I would use a mutex but I would also not die on this hill.

Dec 30 '24 15:12 whitequark

Excellent point, thank you very much for your careful review.

My general position on concurrency in python-libusb1 is something like:

support it...
...for reasonable uses (ex: the code will not fight against a caller setting pollFD notifiers multiple times in parallel)
...but even for unreasonable uses it must not crash (ex: an object handed out to C code must not be up for garbage collection until it is handed back)
...as much as possible without imposing performance costs on well-behaved code (like taking locks when caller may be single threaded)

I think count atomicity belongs to the 3rd point: if it goes wrong, some object will not be finalized in the proper order and may cause a segfault. So I must think more - hopefully I will have an idea not involving a lock.

Dec 30 '24 17:12 vpelletier

class Finalizer:
    per_thread = threading.local()
    
    def next_sequence(self):
        self.per_thread.sequence = getattr(self.per_thread, "sequence", 0) + 1
        return (self.per_thread.sequence, threading.current_thread().ident)

Dec 30 '24 17:12 whitequark

(Sorry, I've unintentionally omitted a part of the code before; I've edited it to reflect that.)

This solution ambiently assumes that you aren't sharing USBContext objects between threads, i.e. by the time a thread is dead, all of the objects referenced by it are also dead. I think that is a reasonable assumption but it should perhaps be made explicit.

Dec 30 '24 18:12 whitequark

by the time a thread is dead, all of the objects referenced by it are also dead

Indeed, if a subsequent thread is spawned and gets the same thread id, then duplicate keys may happen again.

I suspect there are setups where this may not hold though, like a worker-thread design where the number of threads changes and objects are not assigned to any given thread.

...The song of the lock-mermaids sure is tempting.

Jan 08 '25 08:01 vpelletier

An uncontended lock should be quite cheap to take, and contention seems like it wouldn't be very common (how many people even have more than one USB context in first place?)

Jan 08 '25 21:01 whitequark

It seems to be cheap indeed. I added a lock in 11e700bcbebceff3752a34849a39a360d0af4322 .

Jan 11 '25 10:01 vpelletier

Nice! I'm a bit exhausted lately so I can't test this immediately.

Jan 11 '25 16:01 whitequark

One of the community members have tested it in Glasgow and it seems to work. Would it be possible to get a release with the fix included?

Mar 02 '25 23:03 whitequark

I had forgotten as well. Thanks for the heads-up. I have now released the fixes as 3.3.0, both here and on pypi.

Thank you very much for your patience and help.

Mar 04 '25 05:03 vpelletier

No worries, thank you for maintaining this excellent library!

Mar 04 '25 05:03 whitequark