trafficserver icon indicating copy to clipboard operation
trafficserver copied to clipboard

Improving `traffic_ctl config reload`

Open cmcfarlen opened this issue 6 months ago • 1 comments

I would like to propose a new way for ATS to handle config reload that will hopefully provide more feedback and information to the user.

Current issues:

  • traffic_ctl config reload asks traffic_server to reload its config. The config reload command first enumerates the files it has watches on to check for changes and then it signals subsystems that have changes that they should reconfigure itself. The file checks happen while the traffic_ctl client is waiting and if that process is slow, could result in a timeout which gives ambiguous results. When the timeout happens, the configuration still proceeds.
  • There isn't any structured way to get status or results of a reload. Tooling is left to parse log files if they need to determine when and if the configuration finished successfully.

New proposal:

  • Initiating a reconfigure traffic_ctl config reload should return status immediately without doing more work than necessary to either start a reconfigure or fail (if a reconfig is already running). The status result should indicate if a reconfigure was started, and return a token representing the current reconfigure operation. (i.e either return the token for the ongoing config, or the token representing the new reconfigure if one is not running currently). All other work should be done in a background continuation (including spinning through the changed files, even records.yaml).
  • ATS should keep track of the information regarding the outcome and details of the last reconfigure. This result should be viewable via structured data (json) from a traffic_ctl command. The information should list the files changed and the result of whatever reconfigure operation resulted from that change (the callback). The information should also indicate the status of the reconfigure op (success, fail or ongoing) and details about records.yaml validation. Bonus points if there is some indication of progress of incomplete operation (i.e. “10 of 25 files reconfigured”).
  • A client that requests a reconfigure can use the returned token and then poll traffic_ctl config statusto positively determine when the reconfigure operation is complete.

This change in behavior could be made transparent to traffic_ctl config reload if we adjust the client side of that command to issue the start reconfigure and then poll for status. This way the config reload command could provide progress if requested or at least be able to give better information should there be issues with the configuration. (i.e. it could wait until the configuration is started or wait until it is completely finished before returning status)

See the sequence diagram below that shows the new proposal. Consider a totally hypothetical mechanism where a virtual filesystem is used to access secrets(certificates) that are stored off host and accessed over a network. This new mechanism will not require the traffic_ctl client to wait on a slow network and could also return progress information to the client.

Image

cmcfarlen avatar Jun 30 '25 19:06 cmcfarlen

I ~~may be~~ am working on this as soon as I have some time.

brbzull0 avatar Jul 07 '25 18:07 brbzull0