hvac
hvac copied to clipboard
Integration tests can't be run on Windows
As reported here:
- https://github.com/hvac/hvac/pull/1006#issuecomment-1595876686
It seems like there's an issue with the way the test Vault server is launched from the test suite, related to I/O redirection.
Since we don't currently run integration tests in Windows in CI, we wouldn't have caught this.
We should ensure that this works so that we can support contributors on multiple platforms.
I have tested it today and make test
without any modifications starts the server.
But only the first few tests are successfull, then they start failing.
If I modify Popen in server manager:
https://github.com/hvac/hvac/blob/31aca14a73ba83a0075d69c0781bb39b51ced5a3/tests/utils/server_manager.py#L57-L59
and remove the PIPEs for stdout and stderr the problem does not occurr any more:
process = subprocess.Popen(
command
)
(this is not really a fix, because the vault output is now mixed together with the test output)
I tested it with two amd64 binaries of vault: v1.13.3 and v1.10.0.
Looks like some buffer overflow ocurrs after vault is running for some time. I will make more analysis.
I redirected stdout an stderr into two files. That worked too without errors. The filesizes after runing one testsuite was quite small (2k stdout and 7k stderr).
I tried to narrow down the error on my local machine with the following test code added to the end of server_manager.py:
if __name__ == '__main__':
logging.basicConfig(level=logging.DEBUG)
manager = ServerManager(
config_paths=["C:/Users/feri/git/hvac_FORK_MAIN/tests/config_files/vault-tls.hcl"],
client=create_client(),
use_consul=False,
)
manager.start()
manager.initialize()
manager.unseal()
logger.debug("breakpoint")
Also I changed the paths in vault-tls.hcl to absolute path.
This code quits after 30 seconds with the following error message:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='127.0.0.1', port=8200): Read timed out. (read timeout=30)
In the stacktrace it is to see, that it was in submit_unseal_keys(). And it reproducable stops when unsealing the third key (which unseals the vault 3/5).
for testing I set a breakpoint in submit_unseal_keys() and stopped at the third call. Then I sent the unseal request using curl from a commandline:
curl -k -XPUT -d "{\"migrate\": false, \"key\": \"cb7db15a29f2f7368821391fa841afd6424f43f673b634bb94b804b2aa31194870\"}}" https://localhost:8200/v1/sys/unseal
This curl call hangs forever and does not return.
Then I returned to the python debugger in the stackframe to ServerManager.unseal() and executed the following command:
manager._processes[0].communicate()
and suddenly the curl command returns with a success message.
So, the problem seems to be, that the communication is somehow blocked. Maybe unsealing triggers some special mode of "blocking IO"? in vault, or it waits for an input from stdin which is unblocked by the communicate() command.
To remove the stdin/stdout pipeing would break the "HVAC_OUTPUT_VAULT_STDERR" environment variable functionality in ServerManager.stop(), which writes stdout to vault{process_num}_stderr.log
and stderr to vault{process_num}_stdout.log
.
Maybe it is a problem with stdin?
Found an interesting information here about pipesize
:
https://docs.python.org/3/library/subprocess.html#popen-constructor
This seems to explain the differences between Linux and Windows.
Just for short testing I set pipesize to 999999 and it seemd to work. I will test it later,
@ferenc-hechler this is all very interesting, the troubleshooting is extremely helpful, thank you!
As a workaround I modified server_manager.py
[*]import platform
...
class ServerManager:
....
def start(self):
...
cluster_ready = False
for config_path in self.config_paths:
command = ["vault", "server", "-config=" + config_path]
logger.debug(f"Starting vault server with command: {command}")
[*] # workaround till issue #1007 is resolved
[*] stdxxx_pipe = subprocess.DEVNULL if platform.system() == "Windows" else subprocess.PIPE
process = subprocess.Popen(
[*] command, stdout=stdxxx_pipe, stderr=stdxxx_pipe
)
self._processes.append(process)
logger.debug(f"Spawned vault server with PID {process.pid}")
....
Now the tests starting a local Vault server are working. But the tests get stuck in test_ldap. The problem is, that the LdapServer is started using "python-ldap-test". I found a description to workaround this problem in windows: https://github.com/zoldar/python-ldap-test/issues/17
I manually applied the fix, now are the test_ldap integrationtests working. Note: On first start a Pop-Up was informing me that a published PORT is blocked by the Windows firewall. I had to allow this.
Info: Seems like the python-ldap-test project is not maintained any longer. The PR for the Windows fix is approved, but still not merged since Nov. 2022.
Summing all up, I would say, testing with windows is not really supported. :-)
Maybe the workarounds can help others to activate local testing , but I am not sure, if it is worth getting everything up and running with Windows.
wow.. that really is unfortunate, and good call out on the defunct ldap library :-/
I do not want to say that running tests on Windows is unsupported, especially because I want to add tests on Windows to our CI.
I am also a Windows user, although I develop hvac
(and almost everything actually) in Linux (via WSL), so I haven't seen these things before.
I really do not like an answer of "just use linux" and I intend to shore up testing on Windows, but as a contributor/developer you may want to check out WSL in the meantime. vscode integrates with it very well.
Thank you again, this information and troubleshooting is invaluable!