idna icon indicating copy to clipboard operation
idna copied to clipboard

The library is not thread-safe

Open socketpair opened this issue 3 years ago • 2 comments

   self_sign_pem_chain = await get_event_loop().run_in_executor(
 File "/usr/lib64/python3.9/concurrent/futures/thread.py", line 52, in run
   result = self.fn(*self.args, **self.kwargs)
 File "/usr/lib/python3.9/site-packages/ideco_crypto/cert.py", line 261, in get_selfgenerated_chain
   common_name = _normalize_domain(common_name, allow_underscore=allow_underscore)
 File "/usr/lib/python3.9/site-packages/ideco_crypto/cert.py", line 76, in _normalize_domain
   return prefix + idna.encode(domain, uts46=True).lower().decode('ascii')
 File "/usr/lib/python3.9/site-packages/idna/core.py", line 349, in encode
   s = uts46_remap(s, std3_rules, transitional)
 File "/usr/lib/python3.9/site-packages/idna/core.py", line 318, in uts46_remap
   from .uts46data import uts46data
ImportError: cannot import name 'uts46data' from partially initialized module 'idna.uts46data' (most likely due to a circular import) (/usr/lib/python3.9/site-packages/idna/uts46data.py)

This happens beacause uts46_remap() imports from .uts46data import uts46data inside. When this happens in two threads in parallel, it leads to error above. Really hard to trigger, but it happens sometimes.

socketpair avatar Sep 01 '21 14:09 socketpair

I imagine you can fix this for your application by calling idna.encode() once before the multiple threads are started, so the import will have already happened.

jribbens avatar Sep 01 '21 18:09 jribbens

Yes, I fixed exactly in this way. But the bug exists in the library anyway.

socketpair avatar Sep 01 '21 18:09 socketpair

This bug has been open a while and I am keen to fix it. Before I roll up my sleeves and do some research on the best approach, I'd invite anyone who is more familiar with an appropriate idiomatic approach to contribute a patch to resolve it. My aim would be to preserve lazy loading the UTS46 data until needed, if possible.

kjd avatar Aug 31 '22 19:08 kjd

There must be something non-obvious going on here, or a bug in Python itself or in libc on @socketpair's machine, because Python already uses a lock around imports - originally global, preventing all concurrent imports, and since 3.3 on a per-module basis:

In Python 3.3, importing a module takes a per-module lock. This correctly serializes importation of a given module from multiple threads (preventing the exposure of incompletely initialized modules)

https://docs.python.org/3.3/whatsnew/3.3.html#a-finer-grained-import-lock

So the original bug report "cannot happen".

If it is a bug in Python then it could be worked around by adding a manual lock around the import presumably...

_import_lock = threading.Lock()

def uts46_remap(domain: str, std3_rules: bool = True, transitional: bool = False) -> str:
    """Re-map the characters in the string according to UTS46 processing."""
    with _import_lock:
        from .uts46data import uts46data
    output = ''
    ...

jribbens avatar Sep 01 '22 12:09 jribbens

I had Python 3.9, as you can see. Possibly it's a bug. But I don't know where. Can you reproduce ? it's possible to suspend importing by inserting time.sleep() at the root scope of a file.

socketpair avatar Sep 02 '22 10:09 socketpair

I've not been able to reproduce this issue, so for now I am going to close this issue. If it can be reproduced happy to reopen.

kjd avatar Sep 13 '22 01:09 kjd