vic
vic copied to clipboard
ci test 6-15-SysLog failed due to inconsistent pid for docker-engine-server
Seen in ci builds 11668, 11672 and 11673.
Test-Cases.Group6-VIC-Machine.6-15-Syslog-VCH-11668-3847-container-logs.zip
In that test, this line https://github.com/vmware/vic/blob/master/tests/test-cases/Group6-VIC-Machine/6-15-Syslog.robot#L57 obtained the pid of docker-engine-server as 286. However, later on when checking the syslog here , the syslog showed the following:
Jun 18 05:49:01 192.168.31.181 docker-engine-server[288]: Unable to load CAs for registry access in config
Jun 18 05:49:01 192.168.31.181 docker-engine-server[288]: Waiting for portlayer to come up
Jun 18 05:49:03 192.168.31.181 docker-engine-server[288]: Portlayer is up and responding to pings
Jun 18 05:49:03 192.168.31.181 docker-engine-server[288]: Refreshing repository cache
So the pids obtained from the syslog and from ps -C docker-engine-server -o pid= (L57) are inconsistent.
@hmahmood Any thoughts?
Also seen in https://ci.vcna.io/vmware/vic/11668.
I think this is just a race. All the tests use the same syslog server, so the mismatch is because the logs are from another VCH. I think just adding retries to this test should fix this.
Many tests are now failing this test. I don't know that it's the same because there's no syslog output retrieved rather than not finding the expected lines: https://ci.vcna.io/vmware/vic/12976
We do get a message that we connect to the syslog server:
time="2017-08-04T22:43:01Z" level=info msg="successfully connected to syslog server"
and we do not get overflow messages which I'd expect if they weren't being sent.
Last entry in the syslog log is from: Aug 2 02:05:37 Photon vic-init[185]: and that log has rolled over to syslog.1 which explains why we're getting an empty response back.
It does imply that the log message may need to include VCH name and the test should check for it - we do restrict by IP but that's going to be frequently reused in CI.
Other items of note:
- vic-init is horrifically verbose when run with debug levels - we need to revisit at what log levels it duplicates the log output of the sessions, and when that duplication goes to anywhere except the serial logging.
- vic-init never updates the IP address for logging with so always logs with
Photonfor all clients. This is going to cause significant confusion.
Using a direct command to log to the syslog service also failed. No evidence of anything wrong in the journalctl unit logs, systemd thinks everything is OK.
Restart of service and retried the logging and it shows up in the syslog as expected. Command used to verify:
logger -n 192.168.31.25 -t ghicken-test -p user.notice "Direct manual test of status"
I do not believe this is due to rate limiting as all remote logging failed to show after the Aug 2nd log mentioned in the previous comment.
https://ci.vcna.io/vmware/vic/13830 https://ci.vcna.io/vmware/vic/13833
PID mismatch seen in:
https://ci.vcna.io/vmware/vic/16123 https://ci.vcna.io/vmware/vic/16126 https://ci.vcna.io/vmware/vic/16128
I recommend updating the test to filter by IP address (or we update syslog support to log with VCH name). Can also include a date range (see here for use of system time from the VCH).
Just as bad as what we're seeing here would be the fact that there's likely to be similar PIDs between VCHs because there's so little that runs in the endpointVMs. We do not want to see old data and think it's good.
Hasnt been seen since move to HaaS. Lowering priority to p2, check for pid collision.
Seen in https://ci-vic.vmware.com/vmware/vic/19918/7
We logged into the endpointVM successfully but for some reason the remote execution of the ps command failed to provide the expected output. We do not check that the command succeeds and therefore look for a pattern in the syslog output that does not include the pid - unsurprisingly we do not find a match.
Logging into '10.158.214.11:22' as 'root'.
Read output: Warning: your password will expire in 0 days [1;31mroot@ [ [0m~[1;31m ]# [0m
...
Executing command 'ps -C docker-engine-server -o pid='.
Command exited with return code 1.
${pid} =
Most recent comment https://github.com/vmware/vic/issues/5522#issuecomment-417350365 has been resolved with #8251 (#8250)