pytest-xdist icon indicating copy to clipboard operation
pytest-xdist copied to clipboard

Constantly hanging test run with the plugin

Open shardakov opened this issue 2 years ago • 12 comments

Hello,

We are using:

platform linux -- Python 3.9.5, pytest-6.2.5, py-1.10.0, pluggy-0.13.1
plugins: forked-1.4.0, xdist-2.5.0, pytest_check-1.0.4, teamcity-messages-1.29, anyio-3.3.4, testrail-2.9.1, dependency-0.5.1

When trying to execute pytest using xdist on remote windows host by loadfile get hanging test run.

The command:

python3 -m pytest -vv --dist=loadfile --tx ssh=admin@test-host-ip --rsyncdir /tmp/autotests_rsync C:\\users\\admin\\pyexecnetcache\\autotests_rsync\\autotests\\testsuite\\positive

The hanging appears in test, which using waiter to get value from postgresql db via SQLAlchemy ORM.

We're passing value from test suite to the following test:

start_time = Waiter.wait_new(lambda: DbTestData.get_session_records_column_by_record_id(
            DbTestData.start_time, record_id)[0][0],
                                     check_func=CheckFunctions.check_none,
                                     error_message=f"Error")
assert start_time is not None, f"Record start_time in db = {start_time}, expected not None"
   def query(*args):
        session = SessionHolder.get_session()
        result = session.query(*args)
        session.commit()
        return result

which using this waiter:

    @staticmethod
    def wait_new(func: Callable, check_func: Callable = CheckFunctions.check_empty, timeout_value: int = 20,
                 timeout_interval: int = 1, error_message: str = ""):
        print(f"Func = {func}")
        value = waiter_exception
        exc_raise_if_fail = TestWaiterException()
        timeout = 0
        in_while = True
        Logger.utils_logger.debug(f"timeout_value = {timeout_value}, timeout_interval = {timeout_interval})")
        while in_while:
            print(f"in_while loop")
            try:
                print(f"Trying execute func")
                value = func()
            except Exception as ex:
                print(f"Exception")
                if timeout == timeout_value:
                    exc_raise_if_fail.with_traceback(sys.exc_info()[2])
                    exc_raise_if_fail.txt += ": " + ex.args[0]
                    in_while = False
                value = waiter_exception
                Logger.utils_logger.debug(f"Exception", exc_info=True)
            finally:
                print(f"Finally")
                Logger.utils_logger.debug(f"Current value: {value}")
                if (timeout > timeout_value) or (value != waiter_exception and not check_func(value)):
                    print(f"Break")
                    break
                else:
                    print(f"Else")
                    timeout += timeout_interval
                    time.sleep(timeout_interval)
        if value == waiter_exception:
            Logger.utils_logger.critical(f"{exc_raise_if_fail.txt}, {error_message}")
            raise exc_raise_if_fail
        return value

It just hangs permanently while executing waiter only when we using xdist plugin.

shardakov avatar Jan 30 '23 15:01 shardakov

hi i took the liberty to make the code block multi line

does it only hang on windows hosts, or also on linux hosts?

it is possible/thinkable that the execmodel code which was added creates an issue by having a remote_exec not run on the main thread

do your utilities require running in the main thread by chance?

RonnyPfannschmidt avatar Jan 30 '23 16:01 RonnyPfannschmidt

also for context - execmodel is something in execnet itself

to verify the issue one may need to downgrade execnet to a version older than execnet 1.2 from before 2014

RonnyPfannschmidt avatar Jan 30 '23 16:01 RonnyPfannschmidt

also for context - execmodel is something in execnet itself

to verify the issue one may need to downgrade execnet to a version older than execnet 1.2 from before 2014

Thank you for your reply. I tried to downgrade execnet to version 1.2, but the problem remained, the tests also hang.

shardakov avatar Jan 31 '23 12:01 shardakov

version 1.2 is the first version with the supposed issue, please try even older

RonnyPfannschmidt avatar Jan 31 '23 12:01 RonnyPfannschmidt

We tried to run tests on version 1.1, but the problem remained. We also use execnet in fixtures before test, it works fine.

shardakov avatar Feb 02 '23 13:02 shardakov

then rigth now, im unaware of what causes them, i presume thee is no known simplified reproducer

RonnyPfannschmidt avatar Feb 02 '23 15:02 RonnyPfannschmidt

We did some investigation, the hanging appears while executing any execnet script on remote host. We created test class to try how it executes, it also hanging:

` class TestLogger:

@pytest.fixture(scope="class")
def testing_loggger(self):
    try:
        print(f"user: {reserve_user}, host: {reserve_host}")
        gw = execnet.makegateway(f"ssh={reserve_user}@{reserve_host}//python=python3.9")
        channel = gw.remote_exec("""
            try:
                import os, traceback, logging
                logging.basicConfig(level=logging.DEBUG, filename=f"C:/test_artifacts/reserve_station_logs/reserve_data_files_sizes.log", filemode='w', format='%(asctime)s - %(levelname)s - %(message)s', datefmt='%d-%m-%Y %H:%M:%S')
                flow = None
                dirs = []
                audio_data_files_sizes = {}
                logging.debug("Something")
                channel.send(("a", "b"))
            except Exception as ex:
                logging.error("Exception", exc_info=True)
                channel.send(ex)    
            """)
        audio_data_files_sizes, video_files_sizes = channel.receive()
        print(audio_data_files_sizes, video_files_sizes)
    except Exception:
        Logger.tests_logger.error("Fixture testing_loggger", exc_info=True)
        pytest.skip("Fixture testing_loggger failure")


def test_log(self, testing_loggger):
    assert True

`

Here we used execnet version 1.1 , 1.0.3, 1.0.5 and launch tests on one worker.

shardakov avatar Feb 03 '23 15:02 shardakov

Then it's possibly a execnet bug, i have a larger change to it in the works but that's months away from landing

RonnyPfannschmidt avatar Feb 03 '23 17:02 RonnyPfannschmidt

Hi! @RonnyPfannschmidt Sorry for the intrusiveness, but I want to know if there are any changes on this issue?

shardakov avatar Mar 10 '23 13:03 shardakov

Unfortunately not

RonnyPfannschmidt avatar Mar 10 '23 13:03 RonnyPfannschmidt

Hi! I have the same issue. Does this problem have been already solved?

Feklan avatar Jun 15 '23 12:06 Feklan