LibCST icon indicating copy to clipboard operation
LibCST copied to clipboard

Multiprocessing-only matchers bug on convert_type_comments

Open stroxler opened this issue 3 years ago • 1 comments

When I tried running the convert_type_comments codemod on a nontrivial directory, I got thousands of this error:

Codemodding /path/to/some/python/file.py
Traceback (most recent call last):
  File "/home/stroxler/kode/LibCST/libcst/codemod/_cli.py", line 288, in _execute_transform
    output_tree = transformer.transform_module(input_tree)
  File "/home/stroxler/kode/LibCST/libcst/codemod/_command.py", line 71, in transform_module
    tree = super().transform_module(tree)
  File "/home/stroxler/kode/LibCST/libcst/codemod/_codemod.py", line 108, in transform_module
    return self.transform_module_impl(tree_with_metadata)
  File "/home/stroxler/kode/LibCST/libcst/codemod/_visitor.py", line 32, in transform_module_impl
    return tree.visit(self)
  File "/home/stroxler/kode/LibCST/libcst/_nodes/module.py", line 90, in visit
    result = super(Module, self).visit(visitor)
  File "/home/stroxler/kode/LibCST/libcst/_nodes/base.py", line 228, in visit
    _CSTNodeSelfT, self._visit_and_replace_children(visitor)
  File "/home/stroxler/kode/LibCST/libcst/_nodes/module.py", line 74, in _visit_and_replace_children
    body=visit_body_sequence(self, "body", self.body, visitor),
  File "/home/stroxler/kode/LibCST/libcst/_nodes/internal.py", line 227, in visit_body_sequence
    return tuple(visit_body_iterable(parent, fieldname, children, visitor))
  File "/home/stroxler/kode/LibCST/libcst/_nodes/internal.py", line 193, in visit_body_iterable
    new_child = child.visit(visitor)
  File "/home/stroxler/kode/LibCST/libcst/_nodes/base.py", line 219, in visit
    should_visit_children = visitor.on_visit(self)
  File "/home/stroxler/kode/LibCST/libcst/matchers/_visitors.py", line 501, in on_visit
    _visit_constructed_funcs(self._extra_visit_funcs, self._matchers, node, self)
  File "/home/stroxler/kode/LibCST/libcst/matchers/_visitors.py", line 444, in _visit_constructed_funcs
    if _should_allow_visit(all_matchers, visit_func):
  File "/home/stroxler/kode/LibCST/libcst/matchers/_visitors.py", line 430, in _should_allow_visit
    return _all_positive_matchers_true(
  File "/home/stroxler/kode/LibCST/libcst/matchers/_visitors.py", line 409, in _all_positive_matchers_true
    if all_matchers[matcher] is None:
KeyError: ClassDef(name=DoNotCare(), body=DoNotCare(), bases=DoNotCare(), keywords=DoNotCare(), decorators=DoNotCare(), lpar=DoNotCare(), rpar=DoNotCare(), leading_lines=DoNotCare(), lines_after_decorators=DoNotCare(), whitespace_after_class=DoNotCare(), whitespace_after_name=DoNotCare(), whitespace_before_colon=DoNotCare(), metadata=DoNotCare())

It appears that for some reason all_matchers is missing a key. But this doesn't happen in the unit tests. This made me think that it might be multiprocessing-related, and sure enough when I ran the same command after applying this patch:

diff --git a/libcst/codemod/_cli.py b/libcst/codemod/_cli.py
index a7b1878..0b1b9e7 100644
--- a/libcst/codemod/_cli.py
+++ b/libcst/codemod/_cli.py
@@ -644,8 +644,13 @@ def parallel_exec_transform_with_prettyprint(  # noqa: C901
             for filename in files
         ]
         try:
+            '''
             for result in p.imap_unordered(
                 _execute_transform_wrap, args, chunksize=chunksize
+            ):
+            '''
+            for result in map(
+                _execute_transform_wrap, args,
             ):
                 # Print an execution result, keep track of failures
                 _print_parallel_result(

it works just fine!

I poked around a bit but didn't see an obvious answer for what's wrong - it seems like all_matchers winds up being self._matchers from the MatcherDecoratableTransformer class, which is set in __init__ and therefore intuitively ought to be reasonably thread safe. But evidence suggests that it isn't.

Is anyone familiar with this issue - maybe I missed something in the docs about how to use matcher decorators without weird multithreading bugs, I skimmed the docs again and didn't notice anything but ¯\_(ツ)_/¯

stroxler avatar Jan 25 '22 21:01 stroxler

cc @martindemello - if you are trying to run these codemods, you may need to apply the patch above so that things run in serial (or write your own commandline wrapper around the transform, particularly if you're trying to run it in chunks so big that you need parallelization).

As far as I can tell so far, the codemod is working reliably as long as I avoid the multiprocessing - I've run it on at least 10k modules or so thus far with hardly any problems

stroxler avatar Jan 26 '22 00:01 stroxler