green stopTest issue with Python 3.12.1 and skipped tests

We have a unittest that is decorated with @unittest.skipIf(...), and when green is collecting tests results in stopTest() we get an AttributeError for start_time on this line: self.test_time = str(time.time() - self.start_time).

Since self.start_time is being set in startTest() but this seems to be skipped on skipped tests, but stopTest() is still being called, it looks like it is a new compatibility issue.

If I silence the AttributeError, we get an other one later on proto_test_result.errors in addProtoTestResult(), so the fix might get tricky.

I will take a look at a permanent fix after 3.5.0 gets released. For new we have commented out the skipped test in our codebase so we can use 3.12.1.

https://docs.python.org/3/whatsnew/changelog.html#python-3-12-1-final

I suspect it might be caused by this change: https://github.com/python/cpython/pull/109725

Dec 12 '23 00:12 sodul

We should be able to check if self.start_time exists in stopTest(), and set self.test_time to a zero duration if it does not.

Dec 15 '23 19:12 CleanCut

For the short term that should work.

Something I was thinking about looking into is to have the classes used by green to inherit from the official classes so that they will be more future proof. But I would do that after cleaning up the code from all pre 3.8 support to get the codebase cleaned up and organized.

Dec 19 '23 19:12 sodul

My plan is to get v4.0.0 released based of the current main in the next few days so that we can have support for python 3.12.0 with a note that 3.12.1 is known to break on skipped tests. This should allow most users of green to be able to use green with python 3.12 while I work on fixing the skipped tests issue. I will probably release v4.1.0 a few days later with this issue fixed.

Jan 13 '24 04:01 sodul

I've released Green 4.0.0 with python 3.12 support. Now that the minimal python version is 3.8 I should be able to refactor the code more easily and will work on the 3.12.1 compatibility issue this week.

Jan 16 '24 22:01 sodul

I made some progress on this issue. I added a test with the skip annotation to reproduce the bug and I'm able to run ./g -s1 green with python 3.12.1, but without -s1.

> ./g -s1 green   
..........................................ssssssss........................................................................s
[ ... snip ... ]
Ran 253 tests in 4.727s using 1 process

OK (passes=114, skips=9)

And:

> ./g green    
..........................................ssssssss.................................................................[stuck]

After looking at green/result.py and unittest/result.py, it is clear that the internals have drifted upstream so my current approach is to have green's BaseTestResult inherit from unittest.TestResult and call super().<method>() where needed. I believe this is the approach that will provide the most forward compatibility while potentially reducing duplicated code.

I had to add some type annotation to help me track things, but this is triggering some cascading error reporting and I do not want to fix everything at once here.

Jan 22 '24 06:01 sodul

-s2 seems ok

> ./g -s3 green
..........................................ssssssss.................................................................^C^CTraceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/stephane/repos/clumio-oss/green/green/cmdline.py", line 103, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/stephane/repos/clumio-oss/green/green/cmdline.py", line 99, in main
    return _main(argv, testing)
           ^^^^^^^^^^^^^^^^^^^^
  File "/Users/stephane/repos/clumio-oss/green/green/cmdline.py", line 67, in _main
    result = run(test_suite, stream, args, testing)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stephane/repos/clumio-oss/green/green/runner.py", line 134, in run
    proto_test_result: ProtoTest | ProtoTestResult = queue.get()
                                                     ^^^^^^^^^^^

It seems that it is having an issue with the tests in test_runner.py. I might have to update the mocks there since python -m unittest green/test/test_runner.py is getting stuck as well.

Jan 22 '24 06:01 sodul

My original idea to have green's BaseTestResult class inherit from unittest.TestResult seems to get the tests to freeze on the internal tests with macOS and Windows, but not in Linux according to GH actions. I'll shelve this for now and will re-implement the new attributes added in Python 3.12.1 as a short term fix. Longer term I do believe inheriting is the proper approach but we will need refactor the internal queueing code of green to properly identify where the race condition occurs. Thanks to mypy some issues have been uncovered but it will take some time to implement all this refactoring.

https://github.com/CleanCut/green/compare/main...clumio-oss:green:python-3.12.1

Jan 25 '24 05:01 sodul

I think it would be great to mostly inherit and/or use upstream implementations. I tried to do that as much as I could...but it wouldn't work in a lot of places. Maybe it will work better now.

Jan 25 '24 21:01 CleanCut

There is something odd with the ProtoTestResult queue when I use py3.12.1, after adding the missing attributes, now I'm getting errors because we get None and ProtoTest instances passed to addProtoTestResult(), which then leads to further errors. I tried to 'ignore' by checking for these unexpected types, but then we get stuck on queue.get(). I'm currently looking at adding more type annotation to better understand how the queue is constructed. I suspect that the logic to detect that there is no more work to be done will have to be tuned. I'll get to the bottom of this.

Jan 26 '24 17:01 sodul

Ya, the queue communication has always been super finicky and hard to debug. Any improvements to that (or to the surrounding architecture) are most welcome.

Jan 30 '24 20:01 CleanCut

For reference I did a diff of the unittest folder in python 3.12.0 and 3.12.1 to help me better understand what the actual changes are:

diff -u ut3.12.0/case.py ut3.12.1/case.py
--- ut3.12.0/case.py	2024-02-06 17:25:20
+++ ut3.12.1/case.py	2024-02-06 17:24:26
@@ -606,7 +606,6 @@
         else:
             stopTestRun = None
 
-        result.startTest(self)
         try:
             testMethod = getattr(self, self._testMethodName)
             if (getattr(self.__class__, "__unittest_skip__", False) or
@@ -616,6 +615,9 @@
                             or getattr(testMethod, '__unittest_skip_why__', ''))
                 _addSkip(result, self, skip_why)
                 return result
+
+            # Increase the number of tests only if it hasn't been skipped
+            result.startTest(self)
 
             expecting_failure = (
                 getattr(self, "__unittest_expecting_failure__", False) or
diff -u ut3.12.0/loader.py ut3.12.1/loader.py
--- ut3.12.0/loader.py	2024-02-06 17:25:20
+++ ut3.12.1/loader.py	2024-02-06 17:24:26
@@ -84,9 +84,13 @@
             raise TypeError("Test cases should not be derived from "
                             "TestSuite. Maybe you meant to derive from "
                             "TestCase?")
-        testCaseNames = self.getTestCaseNames(testCaseClass)
-        if not testCaseNames and hasattr(testCaseClass, 'runTest'):
-            testCaseNames = ['runTest']
+        if testCaseClass in (case.TestCase, case.FunctionTestCase):
+            # We don't load any tests from base types that should not be loaded.
+            testCaseNames = []
+        else:
+            testCaseNames = self.getTestCaseNames(testCaseClass)
+            if not testCaseNames and hasattr(testCaseClass, 'runTest'):
+                testCaseNames = ['runTest']
         loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
         return loaded_suite
 
@@ -95,7 +99,11 @@
         tests = []
         for name in dir(module):
             obj = getattr(module, name)
-            if isinstance(obj, type) and issubclass(obj, case.TestCase):
+            if (
+                isinstance(obj, type)
+                and issubclass(obj, case.TestCase)
+                and obj not in (case.TestCase, case.FunctionTestCase)
+            ):
                 tests.append(self.loadTestsFromTestCase(obj))
 
         load_tests = getattr(module, 'load_tests', None)
@@ -164,7 +172,11 @@
 
         if isinstance(obj, types.ModuleType):
             return self.loadTestsFromModule(obj)
-        elif isinstance(obj, type) and issubclass(obj, case.TestCase):
+        elif (
+            isinstance(obj, type)
+            and issubclass(obj, case.TestCase)
+            and obj not in (case.TestCase, case.FunctionTestCase)
+        ):
             return self.loadTestsFromTestCase(obj)
         elif (isinstance(obj, types.FunctionType) and
               isinstance(parent, type) and
diff -u ut3.12.0/mock.py ut3.12.1/mock.py
--- ut3.12.0/mock.py	2024-02-06 17:25:20
+++ ut3.12.1/mock.py	2024-02-06 17:24:26
@@ -827,7 +827,7 @@
 
 
     def _format_mock_failure_message(self, args, kwargs, action='call'):
-        message = 'expected %s not found.\nExpected: %s\nActual: %s'
+        message = 'expected %s not found.\nExpected: %s\n  Actual: %s'
         expected_string = self._format_mock_call_signature(args, kwargs)
         call_args = self.call_args
         actual_string = self._format_mock_call_signature(*call_args)
@@ -930,7 +930,7 @@
         if self.call_args is None:
             expected = self._format_mock_call_signature(args, kwargs)
             actual = 'not called.'
-            error_message = ('expected call not found.\nExpected: %s\nActual: %s'
+            error_message = ('expected call not found.\nExpected: %s\n  Actual: %s'
                     % (expected, actual))
             raise AssertionError(error_message)
 
@@ -981,7 +981,7 @@
                 raise AssertionError(
                     f'{problem}\n'
                     f'Expected: {_CallList(calls)}'
-                    f'{self._calls_repr(prefix="Actual").rstrip(".")}'
+                    f'{self._calls_repr(prefix="  Actual").rstrip(".")}'
                 ) from cause
             return
 
diff -u ut3.12.0/result.py ut3.12.1/result.py
--- ut3.12.0/result.py	2024-02-06 17:25:20
+++ ut3.12.1/result.py	2024-02-06 17:24:26
@@ -97,10 +97,12 @@
 
             sys.stdout = self._original_stdout
             sys.stderr = self._original_stderr
-            self._stdout_buffer.seek(0)
-            self._stdout_buffer.truncate()
-            self._stderr_buffer.seek(0)
-            self._stderr_buffer.truncate()
+            if self._stdout_buffer is not None:
+                self._stdout_buffer.seek(0)
+                self._stdout_buffer.truncate()
+            if self._stderr_buffer is not None:
+                self._stderr_buffer.seek(0)
+                self._stderr_buffer.truncate()
 
     def stopTestRun(self):
         """Called once after all tests are executed.

Taking the unittest folder from py3.12.0 and putting it in 3.12.1 seems to help. This is good because this should help with getting to the root of the issue.

Feb 07 '24 01:02 sodul

I have encouraging news!

Python 3.12.2 was released yesterday and the bug fixes include changes to the unittests package. Since pyenv does not support it yet I took the unittest folder from 3.12.2 and placed it inside a 3.12.1 install and green started to work again on skipped tests:

> green --version
Green 4.0.1-dev, Coverage 7.4.1, Python 3.12.1

> green -rv green
...................................................................................................................s....................................................................................................................s

Skipped green.test.test_output.TestGreenStream.testDisableWindowsFalse - Colorama won't strip ANSI unless running on Windows

Skipped green.test.test_windows.TestWindows.test_colorOutput - This test is for windows-specific behavior.

Name                     Stmts   Miss  Cover   Missing
------------------------------------------------------
green/__init__.py            2      2     0%   1-3
green/__main__.py            3      3     0%   1-5
green/cmdline.py            61     13    79%   90-105
green/command.py            46      0   100%
green/config.py              0      0   100%
green/djangorunner.py       62      0   100%
green/examples.py           27     27     0%   1-69
green/exceptions.py          3      0   100%
green/junit.py             123      0   100%
green/loader.py            264      0   100%
green/output.py            101      0   100%
green/process.py            54      0   100%
green/result.py            412      3    99%   205, 223, 342
green/runner.py             73      0   100%
green/suite.py             126     13    90%   109, 113-116, 128-133, 139-142
green/test/__init__.py       0      0   100%
green/version.py             0      0   100%
------------------------------------------------------
TOTAL                     1357     61    96%

Ran 233 tests in 5.456s using 10 processes

OK (passes=231, skips=2)

The diff is pretty small so I might be able to add custom logic just for 3.12.1, or we can just flag 3.12.1 as incompatible with green.

> diff ut3.12.1 ut3.12.2
diff ut3.12.1/case.py ut3.12.2/case.py
608a609
>         result.startTest(self)
618,620d618
< 
<             # Increase the number of tests only if it hasn't been skipped
<             result.startTest(self)
diff ut3.12.1/main.py ut3.12.2/main.py
283c283
<             if self.result.testsRun == 0:
---
>             if self.result.testsRun == 0 and len(self.result.skipped) == 0:
diff ut3.12.1/mock.py ut3.12.2/mock.py
2202,2203c2202,2206
< _CODE_ATTRS = dir(CodeType)
< _CODE_SIG = inspect.signature(partial(CodeType.__init__, None))
---
> try:
>     _CODE_SIG = inspect.signature(partial(CodeType.__init__, None))
>     _CODE_ATTRS = dir(CodeType)
> except ValueError:
>     _CODE_SIG = None
2223,2225c2226,2231
<         code_mock = NonCallableMock(spec_set=_CODE_ATTRS)
<         code_mock.__dict__["_spec_class"] = CodeType
<         code_mock.__dict__["_spec_signature"] = _CODE_SIG
---
>         if _CODE_SIG:
>             code_mock = NonCallableMock(spec_set=_CODE_ATTRS)
>             code_mock.__dict__["_spec_class"] = CodeType
>             code_mock.__dict__["_spec_signature"] = _CODE_SIG
>         else:
>             code_mock = NonCallableMock(spec_set=CodeType)
diff ut3.12.1/result.py ut3.12.2/result.py
100,105c100,103
<             if self._stdout_buffer is not None:
<                 self._stdout_buffer.seek(0)
<                 self._stdout_buffer.truncate()
<             if self._stderr_buffer is not None:
<                 self._stderr_buffer.seek(0)
<                 self._stderr_buffer.truncate()
---
>             self._stdout_buffer.seek(0)
>             self._stdout_buffer.truncate()
>             self._stderr_buffer.seek(0)
>             self._stderr_buffer.truncate()
diff ut3.12.1/runner.py ut3.12.2/runner.py
277c277
<         elif run == 0:
---
>         elif run == 0 and not skipped:

Feb 07 '24 19:02 sodul

This is the bug report on the CPython side: https://github.com/python/cpython/issues/113267

Feb 07 '24 19:02 sodul

we can just flag 3.12.1 as incompatible with green

That would be my vote, but I'll let you decide.

Feb 07 '24 19:02 CleanCut

I'm waiting for pyenv and GH actions to both provide 3.12.2 and then I will update the GH actions, documentation and the toml file to mark 3.12.1 incompatible. There is no rush since pypi.org and pip do not make it easy to update older releases to reflect compatibility conflicts unfortunately.

Feb 10 '24 06:02 sodul