f5-appsvcs-extension icon indicating copy to clipboard operation
f5-appsvcs-extension copied to clipboard

Service Discovery Causes Massive timeout issues with UCS Creation

Open VDI-Tech-Guy opened this issue 1 year ago • 1 comments

Environment

  • Application Services Version: Tested Latest and other versions with same issue 1.40+
  • BIG-IP Version: 17.1.1

Summary

When trying to create a UCS Backup of a BIG-IP the time it takes to create that UCS is massively increased as per https://cdn.f5.com/product/bugtracker/ID985329.html

This is stemmed from

Mar 28 12:23:32 bigip.f5demo.net err iAppsLX_save_pre[14953]: Failed task: /shared/iapp/build-package/bf9e1f3e-7826-4728-b8f0-67a2fb6b4b40: rpmbuild command failed: com.f5.rest.workers.shell.CommandExecuteException: Command execution process killed Mar 28 12:23:32 bigip.f5demo.net err iAppsLX_save_pre[14953]: Failed to get getRPM build response within timeout for f5-service-discovery Mar 28 12:23:32 bigip.f5demo.net info iAppsLX_save_pre[14953]: Exporting: f5-appsvcs - /var/config/rest/iapps/f5-appsvcs

If you remove AS3 and Service Discovery, UCS Backups on a clean bigip (Nothing on it) takes ~20 seconds, when this is implemented can be from 2-5 Minutes due to this timeout, i have also seen where UCS never gets created as well.

Talked with Mark Dittmer abou tthis and he suggested an issue ticket.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Have fresh BIG-IP (or you can use my Ansible 101/201 UDF Blueprint)
  2. Install AS3 (UDF Blueprint has a version already on it)
  3. In my usecase (using UDF) i ran the Backup and Restore information as per documentation
  • Code https://github.com/f5devcentral/f5-bd-ansible-labs/tree/main/201-F5-Advanced/Modules/00-Backup-Restore-Role
  • https://github.com/f5devcentral/f5-bd-ansible-labs/blob/main/201-F5-Advanced/Modules/00-Backup-Restore-Role/roles/f5_backup_data/tasks/f5_backup_config.yaml
  • Documentation for the Lab - https://clouddocs.f5.com/training/fas-ansible-use-cases/201/Modules/00-Backup-Restore-Role.html
  1. Error occurs typically that says increase timeout value ~3-5 minutes
  2. Use Webshell on F5 to do access /var/log/ltm and see error indicated above.

To fix behavior in the lab login to TMUI and remove AS3/DO via AppsLX Section including Service Discovery, and re-run playbook above takes ~20 Seconds to a minute to complete with no issues.

Expected Behavior

This error shouldnt occur and create a massive time delay in creating UCS Files, it should take approximatly the same amount of time to backup a UCS with Ansible with our without AS3.

Actual Behavior

Backups slow down to the point of Ansible failing via timeouts or never completing the tasks.

VDI-Tech-Guy avatar Mar 28 '24 19:03 VDI-Tech-Guy

Adding notes here that Restore Times for UCS Files are also impacted by this as well, sometimes the restore commands will hang or never fully run (or take 10+ minutes) when the AS3/Service Discovery is removed everything moves fast.

VDI-Tech-Guy avatar May 28 '24 16:05 VDI-Tech-Guy