hvac Integration tests can't be run on Windows

Integration tests can't be run on Windows

Open briantist opened this issue 1 year ago • 7 comments

As reported here:

https://github.com/hvac/hvac/pull/1006#issuecomment-1595876686

It seems like there's an issue with the way the test Vault server is launched from the test suite, related to I/O redirection.

Since we don't currently run integration tests in Windows in CI, we wouldn't have caught this.

We should ensure that this works so that we can support contributors on multiple platforms.

Jun 18 '23 00:06 briantist

I have tested it today and make test without any modifications starts the server. But only the first few tests are successfull, then they start failing.

grafik

If I modify Popen in server manager:

https://github.com/hvac/hvac/blob/31aca14a73ba83a0075d69c0781bb39b51ced5a3/tests/utils/server_manager.py#L57-L59

and remove the PIPEs for stdout and stderr the problem does not occurr any more:

 process = subprocess.Popen( 
     command
 )

(this is not really a fix, because the vault output is now mixed together with the test output)

I tested it with two amd64 binaries of vault: v1.13.3 and v1.10.0.

Looks like some buffer overflow ocurrs after vault is running for some time. I will make more analysis.

Jun 18 '23 08:06 ferenc-hechler

I redirected stdout an stderr into two files. That worked too without errors. The filesizes after runing one testsuite was quite small (2k stdout and 7k stderr).

Jun 18 '23 12:06 ferenc-hechler

I tried to narrow down the error on my local machine with the following test code added to the end of server_manager.py:

if __name__ == '__main__':
    logging.basicConfig(level=logging.DEBUG)
    manager = ServerManager(
        config_paths=["C:/Users/feri/git/hvac_FORK_MAIN/tests/config_files/vault-tls.hcl"],
        client=create_client(),
        use_consul=False,
    )
    manager.start()
    manager.initialize()
    manager.unseal()
    logger.debug("breakpoint")

Also I changed the paths in vault-tls.hcl to absolute path.

This code quits after 30 seconds with the following error message:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='127.0.0.1', port=8200): Read timed out. (read timeout=30)

In the stacktrace it is to see, that it was in submit_unseal_keys(). And it reproducable stops when unsealing the third key (which unseals the vault 3/5).

for testing I set a breakpoint in submit_unseal_keys() and stopped at the third call. Then I sent the unseal request using curl from a commandline:

curl -k -XPUT -d "{\"migrate\": false, \"key\": \"cb7db15a29f2f7368821391fa841afd6424f43f673b634bb94b804b2aa31194870\"}}" https://localhost:8200/v1/sys/unseal

This curl call hangs forever and does not return.

Then I returned to the python debugger in the stackframe to ServerManager.unseal() and executed the following command:

manager._processes[0].communicate()

and suddenly the curl command returns with a success message.

So, the problem seems to be, that the communication is somehow blocked. Maybe unsealing triggers some special mode of "blocking IO"? in vault, or it waits for an input from stdin which is unblocked by the communicate() command.

To remove the stdin/stdout pipeing would break the "HVAC_OUTPUT_VAULT_STDERR" environment variable functionality in ServerManager.stop(), which writes stdout to vault{process_num}_stderr.log and stderr to vault{process_num}_stdout.log.

Maybe it is a problem with stdin?

Jun 18 '23 15:06 ferenc-hechler

Found an interesting information here about pipesize: https://docs.python.org/3/library/subprocess.html#popen-constructor

grafik

This seems to explain the differences between Linux and Windows.

Just for short testing I set pipesize to 999999 and it seemd to work. I will test it later,

Jun 18 '23 16:06 ferenc-hechler

@ferenc-hechler this is all very interesting, the troubleshooting is extremely helpful, thank you!

Jun 18 '23 17:06 briantist

As a workaround I modified server_manager.py

[*]import platform
...
class ServerManager:
    ....
    def start(self):
        ...
        cluster_ready = False
        for config_path in self.config_paths:
            command = ["vault", "server", "-config=" + config_path]
            logger.debug(f"Starting vault server with command: {command}")
[*]         # workaround till issue #1007 is resolved
[*]         stdxxx_pipe = subprocess.DEVNULL if platform.system() == "Windows" else subprocess.PIPE
            process = subprocess.Popen(
[*]             command, stdout=stdxxx_pipe, stderr=stdxxx_pipe
            )
            self._processes.append(process)
            logger.debug(f"Spawned vault server with PID {process.pid}")
        ....

Now the tests starting a local Vault server are working. But the tests get stuck in test_ldap. The problem is, that the LdapServer is started using "python-ldap-test". I found a description to workaround this problem in windows: https://github.com/zoldar/python-ldap-test/issues/17

I manually applied the fix, now are the test_ldap integrationtests working. Note: On first start a Pop-Up was informing me that a published PORT is blocked by the Windows firewall. I had to allow this.

Info: Seems like the python-ldap-test project is not maintained any longer. The PR for the Windows fix is approved, but still not merged since Nov. 2022.

Summing all up, I would say, testing with windows is not really supported. :-)

Maybe the workarounds can help others to activate local testing , but I am not sure, if it is worth getting everything up and running with Windows.

Jun 19 '23 22:06 ferenc-hechler

wow.. that really is unfortunate, and good call out on the defunct ldap library :-/

I do not want to say that running tests on Windows is unsupported, especially because I want to add tests on Windows to our CI. I am also a Windows user, although I develop hvac (and almost everything actually) in Linux (via WSL), so I haven't seen these things before.

I really do not like an answer of "just use linux" and I intend to shore up testing on Windows, but as a contributor/developer you may want to check out WSL in the meantime. vscode integrates with it very well.

Thank you again, this information and troubleshooting is invaluable!

Jun 24 '23 03:06 briantist

hvac hvac copied to clipboard

Integration tests can't be run on Windows

hvac
hvac copied to clipboard