ansible.windows icon indicating copy to clipboard operation
ansible.windows copied to clipboard

KB5004244 appears to have broken win_reboot on Server 2019

Open dpower1985 opened this issue 2 years ago • 9 comments

SUMMARY

I first noticed this issue with the Out of bound patch Microsoft released last week for the Print Nightmare vulnerability, but it also appears to be happening with the update released yesterday for Server 2019 which replaced it, KB5004244. Once this update is installed the win_reboot module will successfully trigger a reboot but then never detect the system has come back up again. I've verified WinRM is running and reachable but Ansible doesn't detect it automatically. Oddly enough if I manually reboot the machine a second time then once it comes back up Ansible detects this and reconnects, but the next time the system is rebooted by the win_reboot module it fails to detect the server coming back up. It just waits for my reboot_timeout to expire and then fails the task.

I'm seeing this issue when building new machines and installing Windows Updates. If I explicitly exclude KB5004244 however then everything works fine and every win_reboot call works as expected.

ISSUE TYPE
  • Bug Report
COMPONENT NAME

ansible.windows.win_reboot

ANSIBLE VERSION
ansible [core 2.11.2]
  config file = /build/templates/ansible.cfg
  configured module search path = ['/build/templates/playbooks/modules', '/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible
  python version = 3.9.6 (default, Jul  1 2021, 07:39:48) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
  jinja version = 3.0.1
  libyaml = True
COLLECTION VERSION
# /usr/local/lib/python3.9/site-packages/ansible_collections
Collection      Version
--------------- -------
ansible.windows 1.7.0
CONFIGURATION
DEFAULT_CONNECTION_PLUGIN_PATH(/build/templates/ansible.cfg) = ['/build/templates/connection_plugins']
DEFAULT_MODULE_PATH(/build/templates/ansible.cfg) = ['/build/templates/playbooks/modules', '/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
DEFAULT_ROLES_PATH(/build/templates/ansible.cfg) = ['/build/templates/roles']
DEPRECATION_WARNINGS(/build/templates/ansible.cfg) = False
SYSTEM_WARNINGS(/build/templates/ansible.cfg) = False
OS / ENVIRONMENT

Target OS - Windows Server 2019

STEPS TO REPRODUCE
- name: Second update run
  ansible.legacy.win_shell: |
    Import-Module C:\PSWindowsUpdate\PSWindowsUpdate.psd1 -Force -ErrorAction Stop
    Get-WUInstall -NotCategory "Drivers" -Install -AcceptAll -IgnoreReboot -Verbose | Select-Object Result, KB, Size, Title | Format-Table -Autosize | Out-File C:\Windows\Temp\UpdateLog2.log
  vars:
    ansible_become: yes
    ansible_become_method: runas
    ansible_become_user: packer
  tags: 
    - windows_all

- name: Second update reboot
  ansible.windows.win_reboot:
    reboot_timeout: 3600
  tags: 
    - windows_all
EXPECTED RESULTS

TASK [windows_update_powershell : Second update run] ***************************[0m changed: [default][0m TASK [windows_update_powershell : Second update reboot] ************************* changed: [default]

ACTUAL RESULTS
TASK [windows_update_powershell : Second update run] ***************************[0m
changed: [default][0m
TASK [windows_update_powershell : Second update run reboot] ********************[0m
fatal: [default]: FAILED! => {"changed": false, "elapsed": 3612, "msg": "Timed out waiting for last boot time check (timeout=3600)", "rebooted": true}[0m

dpower1985 avatar Jul 14 '21 11:07 dpower1985

Done a bit more testing with this and it seems to occur when using an older ISO that hasn't had any updates applied to it for quite some time. I re-tested after slipstreaming KB5003711 and KB5003646 (2021-06 SSU and 2021-06 Cumulative Update) into the ISO and then running the same playbook the reboots are detected successfully.

Not sure what it is about installing KB5004244 without an older update that seems to break the win_reboot detection, but it can at least be worked around by starting out from a more up to date image (Though I'm not sure just how up to date it needs to be to resolve the issue)

dpower1985 avatar Jul 15 '21 14:07 dpower1985

Are you able to run with a high verbosity (-vvvvv) and share the full log. This will probably be a lot of text but it will help show what the pre reboot datestamp was set to and the value of the post reboot datestamp.

Basically how win_reboot works is that it

  1. runs Get-CimInstance -ClassName Win32_OperatingSystem -Property LastBootUpTime).LastBootUpTime.ToFileTime() to get the current time
  2. reboots the host
  3. runs the same command from step 1 until the value has changed (host is back online after the reboot)
  4. Runs a test command to verify the host is back online

It could be that step 3 is never returning a different value for some reason causing it to think the reboot hasn't occurred or it could be that step 4 is never true for x reason. Narrowing down what is causing the check to never complete will help try and identify where the problem is and what we have to do to potentially fix it.

jborean93 avatar Jul 21 '21 19:07 jborean93

After adding the -vvvv tag I get this until the timeout is hit,

 windows2019_corp: TASK [windows_update_powershell : Pre-Patch reboot] ****************************
    windows2019_corp: task path: /build/templates/roles/windows_update_powershell/tasks/main.yml:12
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: rebooting server...
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: validating reboot
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.66> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.66

When I check the actual VM I can see that it definitely has actually rebooted and that WinRM is still running.

I've tried using a more up to date ISO file from MSDN that has all updates up to June 2021 included, but it seems that even manually rebooting the VM isn't helping it reconnect anymore either.

The only things I've run prior to this reboot are,

- name: Remove Internet Explorer and SMB1
  ansible.windows.win_optional_feature:
    name: 
      - Internet-Explorer-Optional-amd64
      - smb1protocol
    state: absent
  async: 600
  poll: 30
  retries: 3
  delay: 10
  tags: 
    - windows_all

- name: Enable .Net Framework 3.5
  ansible.windows.win_feature:
    name:
      - NET-Framework-Features
      - NET-Framework-Core
    source: D:\sources\sxs
    state: present
  async: 600
  poll: 30
  retries: 3
  delay: 10
  tags: 
    - windows_all

I reverted back to my older ISO and the pre-patch reboot now does work,

    windows2019_corp: TASK [windows_update_powershell : Pre-Patch reboot] ****************************
    windows2019_corp: task path: /build/templates/roles/windows_update_powershell/tasks/main.yml:12
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: rebooting server...
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: validating reboot
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting to get system boot time
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: <10.134.192.67> ESTABLISH WINRM CONNECTION FOR USER: packer on PORT 5985 TO 10.134.192.67
    windows2019_corp: ansible.windows.win_reboot: attempting post-reboot test command
    windows2019_corp: EXEC (via pipeline wrapper)
    windows2019_corp: ansible.windows.win_reboot: system successfully rebooted
    windows2019_corp: changed: [default] => {
    windows2019_corp:     "changed": true,
    windows2019_corp:     "elapsed": 166,
    windows2019_corp:     "rebooted": true
    windows2019_corp: }
    windows2019_corp:

dpower1985 avatar Jul 22 '21 07:07 dpower1985

5 v's should show the command result info which is what is what I'm hoping to see here.

jborean93 avatar Jul 22 '21 08:07 jborean93

Sorry about that! That's what I get for typing rather than pasting! That has run through now and you weren't joking about a lot of text. I've attached the log file due to the length. As before the reboot after applying the update ceases to reconnect, whilst multiple reboots prior to that completed fine.

The WinRM service is being configured for Automated Delayed startup prior to any reboots taking place.

outputlog.txt

dpower1985 avatar Jul 22 '21 12:07 dpower1985

Thanks for that, there's a few things to note:

  • Your collection list indicates you are using ansible.windows==1.7.0 but the output is showing that it's using older code so win_reboot is being prioritised from some other path
    • For example the command used to get the boot time is (Get-WmiObject -ClassName Win32_OperatingSystem).LastBootUpTime and not the Get-CimInstance one.
    • Some of the output would indicate it's using whatever is at /root/.ansible/collections/ansible_collections/ansible/windows
  • The reason why win_reboot is timing out is not because it's getting the same boot time but rather it's failing to reconnect to host during the reboot
    • I see the host go down and the remaining errors are Connection to 10.134.192.74 timed out. (connect timeout=6)
    • There's no more reconnections, it simply keeps on seeing connection timeouts until win_reboot itself times out

At this point I suspect either the WinRM service didn't come back up properly or maybe Ansible isn't resetting the connection properly. Some things to try and check

  • Watch the actual console screen of the host and see if the logon screen is actually appearing while Ansible is waiting for it to come back online
  • Try random win_ping requests while Ansible is waiting for it to come back online
    • This would be done in a separate process just to see if maybe the connection Ansible is using is getting jammed in some perculiar way
    • If this also fails with connection timeouts then Ansible isn't the problem but something on the host (updates on 2016/2019 can take forever to properly install sometimes)
  • Try to update the collection to 1.7.0
    • As mentioned earlier the log output indicates you are using an older version of the collection and not 1.7.0
    • The new version has a refactored setup of win_reboot which should be smarter around handling connection resets and checking for it to be online
  • Try using ansible_connection: psrp instead of WInRM
    • They both operate on WinRM but PSRP uses the PowerShell remoting layer
    • Might be a bit smarter/more efficient and could potentially solve the problem - unlikely but worth a shot

I am also curious though, is there a reason why you are using PSWindowsUpdates to install the updates rather than the win_updates module. The module has gone through quite a large refactor in 1.7.0 which should fix some of the problems that are out there but it would be good to find out if there were other issues I wasn't aware about.

jborean93 avatar Jul 22 '21 21:07 jborean93

Thanks for the guidance, I'll give those a try and come back with some results!

In regards to the collection, I suspect the older one is left from when we were previously using an older version of Ansible that made use of Python 2. After I updated my containers to remove that Ansible it looks like it left behind that older collection. I've removed that now so hopefully it should now direct to the 1.7.0 collection.

In regards to PSWindowsUpdates, I was having issues where win_updates was seeming to fail with installing some cumulative updates correctly. I raised an issue but I don't think anyone else was able to replicate it. Now that I have switched to using newer Windows ISO files though it's likely worth me revisiting that option. It's also possible my issue was with using the older 1.5.0 collection, again due to the issues with the older Ansible instance.

dpower1985 avatar Jul 23 '21 10:07 dpower1985

So I removed the old collection as suggested and ensured that only ansible.windows.win_reboot was being used in all playbooks and re-ran the test.

As before the process seemed to stop after the second update reboot stage with the constant connection time outs. I was monitoring the VM console and confirmed the system had rebooted and come back to the login prompt. I left it on this for 25 minutes then from the same location that was running the build I ran a simple win_ping playbook,

  - name: Ping test
    ansible.windows.win_ping:

But this also failed to connect,

ansible-playbook [core 2.11.2]
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/local/bin/ansible-playbook
  python version = 3.9.6 (default, Jul  1 2021, 07:39:48) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
  jinja version = 3.0.1
  libyaml = True
No config file found; using defaults
setting up inventory plugins
host_list declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
script declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
auto declined parsing /etc/ansible/hosts as it did not pass its verify_file() method
Parsed /etc/ansible/hosts inventory source with ini plugin
Loading collection ansible.windows from /usr/local/lib/python3.9/site-packages/ansible_collections/ansible/windows
Loading callback plugin default of type stdout, v2.0 from /usr/local/lib/python3.9/site-packages/ansible/plugins/callback/default.py
Attempting to use 'default' callback.
Skipping callback 'default', as we already have a stdout callback.
Attempting to use 'junit' callback.
Attempting to use 'minimal' callback.
Skipping callback 'minimal', as we already have a stdout callback.
Attempting to use 'oneline' callback.
Skipping callback 'oneline', as we already have a stdout callback.
Attempting to use 'tree' callback.

PLAYBOOK: ping.yml *************************************************************
Positional arguments: ping.yml
verbosity: 5
connection: smart
timeout: 10
become_method: sudo
tags: ('all',)
inventory: ('/etc/ansible/hosts',)
forks: 5
1 plays in ping.yml

PLAY [10.134.192.76] ***********************************************************

TASK [Gathering Facts] *********************************************************
task path: /home/ansible/ping.yml:2
redirecting (type: modules) ansible.builtin.setup to ansible.windows.setup
Using module file /usr/local/lib/python3.9/site-packages/ansible_collections/ansible/windows/plugins/modules/setup.ps1
Pipelining is enabled.
<10.134.192.76> ESTABLISH WINRM CONNECTION FOR USER: administrator on PORT 5985 TO 10.134.192.76
<10.134.192.76> WINRM CONNECT: transport=basic endpoint=http://10.134.192.76:5985/wsman
<10.134.192.76> WINRM CONNECTION ERROR: HTTPConnectionPool(host='10.134.192.76', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fcaec5b2340>, 'Connection to 10.134.192.76 timed out. (connect timeout=120)'))
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.9/http/client.py", line 1257, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.9/http/client.py", line 1303, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.9/http/client.py", line 1252, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.9/http/client.py", line 1012, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.9/http/client.py", line 952, in send
    self.connect()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 200, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.9/site-packages/urllib3/connection.py", line 174, in _new_conn
    raise ConnectTimeoutError(
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7fcaec5b2340>, 'Connection to 10.134.192.76 timed out. (connect timeout=120)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='10.134.192.76', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fcaec5b2340>, 'Connection to 10.134.192.76 timed out. (connect timeout=120)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/ansible/plugins/connection/winrm.py", line 426, in _winrm_connect
    self.shell_id = protocol.open_shell(codepage=65001)  # UTF-8
  File "/usr/local/lib/python3.9/site-packages/winrm/protocol.py", line 166, in open_shell
    res = self.send_message(xmltodict.unparse(req))
  File "/usr/local/lib/python3.9/site-packages/winrm/protocol.py", line 243, in send_message
    resp = self.transport.send_message(message)
  File "/usr/local/lib/python3.9/site-packages/winrm/transport.py", line 334, in send_message
    response = self._send_message_request(prepared_request, message)
  File "/usr/local/lib/python3.9/site-packages/winrm/transport.py", line 339, in _send_message_request
    response = self.session.send(prepared_request, timeout=self.read_timeout_sec)
  File "/usr/local/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/requests/adapters.py", line 504, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='10.134.192.76', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fcaec5b2340>, 'Connection to 10.134.192.76 timed out. (connect timeout=120)'))
fatal: [10.134.192.76]: UNREACHABLE! => {
    "changed": false,
    "msg": "basic: HTTPConnectionPool(host='10.134.192.76', port=5985): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7fcaec5b2340>, 'Connection to 10.134.192.76 timed out. (connect timeout=120)'))",
    "unreachable": true
}

PLAY RECAP *********************************************************************
10.134.192.76              : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

Oddly standard pings to the server had suddenly stopped working as well. This included from another machine on the same network. On a hunch I went into the server in progress of being built and turned off the Windows Firewall, as soon as I did that Ansible was able to connect again and my in-progress win_reboot immediately connected and proceeded with the next tasks.

My current assumption is that something about KB5004244 may be making some changes to the Windows Firewall that is preventing WinRM from being able to reconnect after installation (As everything works fine up until the point that KB is installed). What's odd is that, looking at the Windows Firewall, I can see the built-in rules for Windows Remote Management are both enabled already.

dpower1985 avatar Jul 23 '21 13:07 dpower1985

Update - So ran another test and this time when I reached the problem reboot stage rather than disabling the Windows Firewall I created an additional WinRM rule allowing traffic on TCP Port 5985 on all network profile's and set the Remote IP Address scope to the IP of my Ansible server. As soon as the rule was created Ansible was able to reconnect and progress.

I'll need to do some additional before/after testing to try and pin down what exactly is changing but it definitely seems that the KB is making some kind of alteration to the built-in WinRM Firewall rules.

dpower1985 avatar Jul 23 '21 14:07 dpower1985