UF Crashes on Container Restart (9.2 and 9.1)
When running containers on 9.2.1 (78803f08aabb) or 9.1.4 (a414fc70250e), if the container is restarted it fails to start with the following error:
TASK [splunk_universal_forwarder : Setup global HEC] ***************************
fatal: [localhost]: FAILED! => {
"changed": false
}
MSG:
POST/services/data/inputs/http/httpadmin********8089{'disabled': '0', 'enableSSL': '1', 'port': '8088', 'serverCert': '', 'sslPassword': ''}NoneNoneNone;;; AND excep_str: No Exception, failed with status code 404: {"text":"The requested URL was not found on this server.","code":404}
PLAY RECAP *********************************************************************
localhost : ok=67 changed=3 unreachable=0 failed=1 skipped=69 rescued=0 ignored=0
Thursday 18 April 2024 14:49:02 +0000 (0:00:00.588) 0:00:17.478 ********
===============================================================================
splunk_common : Start Splunk via CLI ------------------------------------ 1.59s
Gathering Facts --------------------------------------------------------- 0.95s
splunk_universal_forwarder : Setup global HEC --------------------------- 0.59s
splunk_common : Cleanup Splunk runtime files ---------------------------- 0.51s
splunk_common : Update Splunk directory owner --------------------------- 0.48s
splunk_common : Update /opt/splunk/etc ---------------------------------- 0.43s
splunk_common : Check for scloud ---------------------------------------- 0.41s
splunk_common : Set mgmt port ------------------------------------------- 0.40s
splunk_common : Find manifests ------------------------------------------ 0.38s
splunk_common : Check if UDS file exists -------------------------------- 0.32s
splunk_common : Configure to set Mgmt Mode as auto (Allows UDS) --------- 0.30s
splunk_common : Remove user-seed.conf ----------------------------------- 0.30s
splunk_common : Reset root CA ------------------------------------------- 0.29s
splunk_common : Get Splunk status --------------------------------------- 0.29s
splunk_common : Disable indexing on the current node -------------------- 0.29s
splunk_common : Ensure license path ------------------------------------- 0.29s
splunk_common : Get Splunk status --------------------------------------- 0.29s
splunk_common : Create .ui_login ---------------------------------------- 0.29s
splunk_common : Check if /sbin/updateetc.sh exists ---------------------- 0.29s
splunk_common : Enable splunktcp input ---------------------------------- 0.29s
9.0.9 (6315942c563f) appears unaffected.
Hi @JoePJisc,
I assume this happens on fresh installed UFs - not on upgrades?
I had the same error and it turned out that this was caused by SPLUNK_HOME_OWNERSHIP_ENFORCEMENT - see SECURITY.md.
When you try to run newer UF as container user splunk there are a lot of warnings that its not working fine. However, these are just warnings so nothing really fails.
However, in this play the error turns into an problem: https://github.com/splunk/splunk-ansible/blob/develop/roles/splunk_common/tasks/enable_admin_auth.yml#L6
The initial splunk admin user setup processes stdout and here the warning results in a broken passwd file:
[splunk@splunk-uf-0 splunkforwarder]$ pwd
/opt/splunkforwarder
[splunk@splunk-uf-0 splunkforwarder]$ cat etc/passwd
:admin:Warning: Attempting to revert the SPLUNK_HOME ownership::administrator:admin:::19853
I fixed this by overwriting the play as following:
---
- name: Set admin access via seed
when: first_run | bool
block:
- name: "Hash the password"
command: "python -c 'import sys, crypt; print(crypt.crypt(sys.argv[1], crypt.mksalt(crypt.METHOD_SHA512)))' '{{ splunk.password }}'"
register: hashed_pwd
changed_when: hashed_pwd.rc == 0
become: yes
become_user: "{{ splunk.user }}"
no_log: "{{ hide_password }}"
That solved it for me - maybe it helps you as well!
Anyway, the root cause for this are in end the issues with SPLUNK_HOME_OWNERSHIP_ENFORCEMENT and I will create an issues to address those.