beets In test I/O utility, restore the old `stdin`/`stdout` instead of the "true" I/O streams

In test I/O utility, restore the old `stdin`/`stdout` instead of the "true" I/O streams

Open sampsyo opened this issue 8 months ago • 6 comments

I got a little bit nerdsniped by the problems observed in #5027. In short, my high-level diagnosis in https://github.com/beetbox/beets/pull/5027#issuecomment-1857953929 seems to have been correct: other tests were suppressing the legitimate failure of a flaky test.

I found the problem by running other tests before the problem test, like this:

$ pytest -k 'test_nonexistant_db or test_delete_removes_item' test/test_ui.py

When running test_nonexistant_db alone, it fails. When running it like this with another test that goes first, it passes. That's the problem.

However, test_delete_removes_item is just one example that works to make this problem happen. It appeared that any test in a class that used our _common.TestCase base class had this power. I tracked down the issue to our DummyIO utility, which was having an unintentional effect even when it was never actually used.

Here's the solution. Instead of restoring sys.stdin to sys.__stdin__, we now restore it to whatever it was before we installed out dummy I/O hooks. This is relevant in pytest, for example, which installs its own sys.stdin, which we were then clobbering. This was leading to the suppression of test failures observed in #5021 and addressed in #5027.

The CI will fail for this PR because it now (correctly) exposes a failing test. Hopefully by combining this with the fixes in the works in #5027, we'll be back to a passing test suite. :smiley: @Phil305, could you perhaps help validate that hypothesis?

Dec 15 '23 14:12 sampsyo

Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry.

Dec 15 '23 14:12 github-actions[bot]

😃 @Phil305, could you perhaps help validate that hypothesis?

Awesome, yeah will do once I'm available, thanks so much for looking into this! I'll rebase on top of your diff and validate

Dec 16 '23 00:12 Phil305

@sampsyo all green when running pytest -k nonex test/test_ui.py!

============================= test session starts ==============================
platform darwin -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0
rootdir: /Users/pjulien/repos/beets
plugins: typeguard-3.0.2, anyio-4.1.0
collected 132 items / 131 deselected / 1 selected

test/test_ui.py .                                                        [100%]

=============================== warnings summary ===============================
../../homebrew/lib/python3.11/site-packages/mediafile.py:52
  /Users/pjulien/homebrew/lib/python3.11/site-packages/mediafile.py:52: DeprecationWarning: 'imghdr' is deprecated and slated for removal in Python 3.13
    import imghdr

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 1 passed, 131 deselected, 1 warning in 0.34s =================

There's only one more failing test for me now (test_load_item_types); I'll be looking into that next (note: this test was failing before the changes from our diffs, it was just getting hit after the one I raised an issue for): python3 test/testall.py

...
======================================================================
FAIL: test_load_item_types (test_metasync.MetaSyncTest.test_load_item_types)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/pjulien/repos/beets/test/test_metasync.py", line 93, in test_load_item_types
    self.assertIn("amarok_score", Item._types)
AssertionError: 'amarok_score' not found in {'data_source': <beets.dbcore.types.String object at 0x101b3a2d0>}

----------------------------------------------------------------------
Ran 1169 tests in 42.193s

FAILED (failures=1, skipped=16)

Seems to be similar in nature to the issue I raised (i.e. some shared state between tests being changed based on test ordering; running pytest -k test_load_item_types results in a passing test). Something's causing the {'data_source': <beets.dbcore.types.String object at 0x101b3a2d0>} dictionary to not have the data it needs for the test to pas...

Dec 16 '23 23:12 Phil305

Awesome; thanks taking a look!!

From @wisp3rwind:

Maybe, given that there's now the installed flag, we could add a guard for nested install calls?

This is a good call. I'll add a check right now.

From @Phil305:

There's only one more failing test for me now (test_load_item_types);

Fascinating! That does look like an unrelated problem; we can try to investigate that next.

For this specific problem, I'm going to pull the change from #5027 into this PR and see if everything goes green. 🤞

Dec 18 '23 22:12 sampsyo

Oh dear, that actually reveled several places where a DummyIO was being double-installed (i.e., it wasn't being cleaned up correctly)! I'll need to come back to this to investigate more deeply…

Dec 18 '23 22:12 sampsyo

Hey guys, I think we'll want to hold off on putting that assertion there, unless we're willing to perhaps redo the structure of the test suites for the failing tests.

TLDR

I'm wondering if there's a way to fix the failing assertion by moving away from shared state, and opting for sharing functionality via stateless functions and explicit definition of state changes that are completely isolated to each test suite. Any worthwhile code duplication could be abstracted away behind the previously mentioned stateless functions. I think this could expose w/e is causing the test failures in an obvious way, making for an easy fix (though I have no proof to back it up unfortunately! My investigation ended up to this point so far 🙂)

Context

I looked into the code, and found a lot of non-obvious implicit state changes for the test suites the failing test cases (listed at bottom of this comment) belong to (i.e. global and class instance-level state changes happening via a class hierarchy, instead of explicity) that make it alot harder for me to fix the assertion failures (most of the info in the diagram isn't directly related to the failing tests, but they highlight the overall situation, so please bear with me!):

Some questions I have, based on what I'm seeing:

test_ui_importer.ImportExistingTest calls self._setup_import_session 2 times, which breaks the assertion in DummyIO: Check for double "installs", but removing one or the other causes more failures... why do we need to use self._setup_import_session() via a superclass (TerminalImportSessionSetup - code pointer)? Can't we just turn any shared logic (i.e. config modifications) into standalone functions, and move the other logic that's been abstracted away directly into each test case for more flexibility? I feel like alot of the state changing logic happening via inheritance is a major anti-pattern, but I'm not familiar enough with the beets architecture to really be able to say definitively (i.e. this might be required because of some hard-to-test behavior of beets itself); I'd love to modify it so we can keep the assertion in the pending commit). What do you guys think?
We're setting at least 8 variables on the test_ui_importer.ImportTest test suite implicitly through superclasses... I'm wondering why we do it like this? This certainly makes understanding and modifying any test cases sharing this structure much more difficult, since it's not obvious what state the tests share from a single place (which would ideally be in only one place, either the setUp method of relevant test_ui_importer.*Test or the parent classes from test_importer which share the same names). Can we change this structure to flatten the class hierarchy (or, at the least, to make all modification of state for the test suites related to these failing tests explicit in one place?)
We're modifying global state (when we modify beet.__init__.config in TerminalImportSessionSetup._setup_import_session), which is implicitly updating the settings on the ImportTest.config variable... why do we do things like this? Can we avoid any global state changes in these tests to make them less likely to affect each other? Is this required because of some part of the beets design that I'm not aware of? It seems like an antipattern to modify global state that tests share, but again, I don't know enough to say anything with absolute certainty

Here's a class diagram to show the current class hierarchy in a cleaner way (the other failing tests are setup essentially the same way):

note: The following tests are failing because of the DummyIO assertion:

ImportTest::test_empty_directory_singleton_warning
ImportTest::test_empty_directory_warning
ImportSingletonTest::test_import_single_files
ImportExistingTest::test_asis_updated_moves_file
ImportExistingTest::test_asis_updated_without_copy_does_not_move_file
ImportExistingTest::test_asis_updates_metadata
ImportExistingTest::test_does_not_duplicate_album
ImportExistingTest::test_does_not_duplicate_item
ImportExistingTest::test_does_not_duplicate_singleton_track
ImportExistingTest::test_outside_file_is_copied
ImportExistingTest::test_outside_file_is_moved

Dec 25 '23 23:12 Phil305

beets beets copied to clipboard

In test I/O utility, restore the old `stdin`/`stdout` instead of the "true" I/O streams

TLDR

Context

beets
beets copied to clipboard