f5-appsvcs-extension
f5-appsvcs-extension copied to clipboard
Service Discovery Causes Massive timeout issues with UCS Creation
Environment
- Application Services Version: Tested Latest and other versions with same issue 1.40+
- BIG-IP Version: 17.1.1
Summary
When trying to create a UCS Backup of a BIG-IP the time it takes to create that UCS is massively increased as per https://cdn.f5.com/product/bugtracker/ID985329.html
This is stemmed from
Mar 28 12:23:32 bigip.f5demo.net err iAppsLX_save_pre[14953]: Failed task: /shared/iapp/build-package/bf9e1f3e-7826-4728-b8f0-67a2fb6b4b40: rpmbuild command failed: com.f5.rest.workers.shell.CommandExecuteException: Command execution process killed Mar 28 12:23:32 bigip.f5demo.net err iAppsLX_save_pre[14953]: Failed to get getRPM build response within timeout for f5-service-discovery Mar 28 12:23:32 bigip.f5demo.net info iAppsLX_save_pre[14953]: Exporting: f5-appsvcs - /var/config/rest/iapps/f5-appsvcs
If you remove AS3 and Service Discovery, UCS Backups on a clean bigip (Nothing on it) takes ~20 seconds, when this is implemented can be from 2-5 Minutes due to this timeout, i have also seen where UCS never gets created as well.
Talked with Mark Dittmer abou tthis and he suggested an issue ticket.
Steps To Reproduce
Steps to reproduce the behavior:
- Have fresh BIG-IP (or you can use my Ansible 101/201 UDF Blueprint)
- Install AS3 (UDF Blueprint has a version already on it)
- In my usecase (using UDF) i ran the Backup and Restore information as per documentation
- Code https://github.com/f5devcentral/f5-bd-ansible-labs/tree/main/201-F5-Advanced/Modules/00-Backup-Restore-Role
- https://github.com/f5devcentral/f5-bd-ansible-labs/blob/main/201-F5-Advanced/Modules/00-Backup-Restore-Role/roles/f5_backup_data/tasks/f5_backup_config.yaml
- Documentation for the Lab - https://clouddocs.f5.com/training/fas-ansible-use-cases/201/Modules/00-Backup-Restore-Role.html
- Error occurs typically that says increase timeout value ~3-5 minutes
- Use Webshell on F5 to do access /var/log/ltm and see error indicated above.
To fix behavior in the lab login to TMUI and remove AS3/DO via AppsLX Section including Service Discovery, and re-run playbook above takes ~20 Seconds to a minute to complete with no issues.
Expected Behavior
This error shouldnt occur and create a massive time delay in creating UCS Files, it should take approximatly the same amount of time to backup a UCS with Ansible with our without AS3.
Actual Behavior
Backups slow down to the point of Ansible failing via timeouts or never completing the tasks.
Adding notes here that Restore Times for UCS Files are also impacted by this as well, sometimes the restore commands will hang or never fully run (or take 10+ minutes) when the AS3/Service Discovery is removed everything moves fast.