bash-completion test: Intermittent test failures by timeouts, etc. with high loadavg

test: Intermittent test failures by timeouts, etc. with high loadavg

Open akinomyoga opened this issue 2 years ago • 1 comments

Let me create an issue for the discussion made in #687 before we forget it.

Reply https://github.com/scop/bash-completion/pull/687#issuecomment-1021580514 by @sudonym1

is caused by the timeout we often see

Is there a ticket for this? I wouldn't mind taking a look this upcoming weekend.

Reply https://github.com/scop/bash-completion/pull/687#issuecomment-1021630333 by @akinomyoga

Thank you! I don't remember any. I have quickly searched, but there seems none explicitly mentioning this issue. #136 and #137 seem to be the timeouts caused by the test case itself. I'm not sure whether the cause of #139 is the test case itself or our test framework. There is a report #430 saying that the test fails or is flaky in ARM architecture with high loads but there is no further information. #573 is related to refactoring the test suite. I couldn't find any issue explicitly mentioning the timeout failure.

Reply https://github.com/scop/bash-completion/pull/687#issuecomment-1025089837 by @scop

Timeouts are a quite general failure mode in the test suite.

Sometimes they occur when tests fail to process expected/unexpected output or the lack of it, kind of masking the root cause. These are fairly reproducible and straightforward to fix.

Sometimes they occur due to something else, perhaps a command actually stalling every now and then, this would be hard to reproduce and fix.

But the most "interesting" thing to me (and the one I guess we're referring to) here is that I think we may still have something wrong in our test suite that triggers this in unknown situations that is not caused by the things mentioned here. Would be awesome to track these down and fix this category of issues. I believe enabling parallel tests provokes the failures more often -- at least it does provoke some failures more. Parallel running happens by default in local dockerized test runs, and can be enabled on the host without docker e.g. like
$ PYTESTFLAGS="--numprocesses=auto --dist=loadfile" make check
https://github.com/scop/bash-completion/pull/687#issuecomment-1025321800 by @sudonym1

It seems I have been getting (un)lucky. I've run the fedora docker container ~10 times with parallel pytests, and no failures so far. I will try a different approach. It could be these issues are related to the performance characteristics of the host computer.

https://github.com/scop/bash-completion/pull/687#issuecomment-1025351856 by @akinomyoga

I guess the problem is caused by some race conditions or the synchronization problem of input and output pipes in the communications between pexpect and the Bash process. There is already some sleep that I guess is related.

Actually, I'm not sure if the patterns specified here covers all the possible cases of the Bash outputs. What happens when some garbage outputs remained in the stdout before writing something to stdin (though I'm not even sure whether that can happen because we are checking MAGIC_MARK between every roundtrip)? The accepted patterns in the current testing code are

/\r\n<PROMPT><COMMAND>.*<MAGIC_MARK>/

/^<MAGIC_MARK>/

/^([^\r]+)<MAGIC_MARK>$/

/^([^\r].*)<MAGIC_MARK>$/

I feel we should just accept [\s\S]*<MAGIC_MARK> and then examine the extracted outputs, or at least we can add another fallback pattern in the above list in order to check what pexpect actually captured.

Feb 24 '22 02:02 akinomyoga

For the record, I was faced with a new type of false test failure in Action #977:

=================================== FAILURES ===================================
_________________ TestUnitInitCompletion.test_redirect_2[&>>] __________________

self = <test_unit_initialize.TestUnitInitCompletion object at 0x7fc051593ba8>
bash = <pexpect.pty_spawn.spawn object at 0x7fc052372470>
cmd1_empty_completion_setup = None, redirect = '&>>'

    @pytest.mark.parametrize("redirect", "> >> 2> {fd1}> < &> &>> >|".split())
    def test_redirect_2(self, bash, cmd1_empty_completion_setup, redirect):
        # Note: Bash 4.3 and below cannot properly extract the redirection ">|"
        if redirect == ">|":
            skipif = "((BASH_VERSINFO[0] * 100 + BASH_VERSINFO[1] < 404))"
            try:
                assert_bash_exec(bash, skipif, want_output=None)
            except AssertionError:
                pass
            else:
                pytest.skip(skipif)
    
        completion = assert_complete(
>           bash, "cmd1 %s f" % redirect, cwd="shared/default"
        )

../t/unit/test_unit_initialize.py:57: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../t/conftest.py:882: in assert_complete
    return CompletionResult()
../t/conftest.py:474: in __exit__
    self._restore_env()
../t/conftest.py:650: in _restore_env
    raise self.captured_error
../t/conftest.py:501: in _safe_assert
    assert_bash_exec(self.bash, cmd, want_output=None)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

bash = <pexpect.pty_spawn.spawn object at 0x7fc052372470>
cmd = '[[ ${OLDPWD-Re8SCgEdfN} == "${_comp__test_1321_NEWVAR_OLDPWD-Re8SCgEdfN}" ]]'
want_output = None, want_newline = True

    def assert_bash_exec(
        bash: pexpect.spawn,
        cmd: str,
        want_output: Optional[bool] = False,
        want_newline=True,
    ) -> str:
        """
        :param want_output: if None, don't care if got output or not
        """
    
        # Send command
        bash.sendline(cmd)
        bash.expect_exact(cmd)
    
        # Find prompt, output is before it
        bash.expect_exact("%s%s" % ("\r\n" if want_newline else "", PS1))
        output = bash.before
    
        # Retrieve exit status
        echo = "echo $?"
        bash.sendline(echo)
        got = bash.expect(
            [
                r"^%s\r\n(\d+)\r\n%s" % (re.escape(echo), re.escape(PS1)),
                PS1,
                pexpect.EOF,
                pexpect.TIMEOUT,
            ]
        )
        status = bash.match.group(1) if got == 0 else "unknown"
    
>       assert status == "0", 'Error running "%s": exit status=%s, output="%s"' % (
            cmd,
            status,
            output,
        )
E       AssertionError: Error running "[[ ${OLDPWD-Re8SCgEdfN} == "${_comp__test_1321_NEWVAR_OLDPWD-Re8SCgEdfN}" ]]": exit status=127, output="

E         "
E       assert '127' == '0'
E         - 127
E         + 0

../t/conftest.py:418: AssertionError

Feb 23 '23 04:02 akinomyoga

bash-completion bash-completion copied to clipboard

test: Intermittent test failures by timeouts, etc. with high loadavg

Reply https://github.com/scop/bash-completion/pull/687#issuecomment-1021580514 by @sudonym1

Reply https://github.com/scop/bash-completion/pull/687#issuecomment-1021630333 by @akinomyoga

Reply https://github.com/scop/bash-completion/pull/687#issuecomment-1025089837 by @scop

https://github.com/scop/bash-completion/pull/687#issuecomment-1025321800 by @sudonym1

https://github.com/scop/bash-completion/pull/687#issuecomment-1025351856 by @akinomyoga

bash-completion
bash-completion copied to clipboard