guild-operators
guild-operators copied to clipboard
chain validation and ledger replay analysis tool
DRAFT for use in #1515
name: log parsing for chain validation and ledger replay about: logAnalyze,sh provides a manual chain validation and ledger replay log analysis title: '' labels: enhancement assignees: @Scitz0 @rdlrt
Description
Adds a new script logAnaylyze.sh
allowing manual analysis of start/end times and percentage of completion for chain validations and ledger replays from log files.
Where should the reviewer start?
Review the Commits / Files changed
Assist in discussion #1515 determining direction for the tool before converting draft into full PR
Motivation and context
- Increases observability post node restart when gLiveView.sh says "starting"
- During chain validation
- During ledger replay events
- Provides a method to determine if node startup is progressing
Which issue it fixes?
No current issue.
How has this been tested?
- testnet node & relay logs (v1.34, v1.35)
- mainnet node & relay logs (v1.34, v1.35)
I left a few notes in Telegram DM but as it's a separate script I give you free space to work something out that you feel adds value. I like the fact that this could add value to gLiveView displaying a more useful startup message.
Added notes for script in TG DM, like adding service deployment like cnode.sh, and directions for auto update like mentioned already. But in general it looks good to me.
Can wait with merge until @TrevorBenson gives his ok.
@Scitz0 @rdlrt Implemented a few more suggestions.
- startupLogMonitor.sh now uses
checkUpdate()
- deploy-as-systemd.sh includes a service
${vname}-startup-logmonitor.service
I will rerequest reviews to be sure everyone is happy with the changes first.
On a side note I haven't tested the deploy-as-systemd.sh
in quite awhile since I use containers. Am I forgetting or just missing a part that prevents the updates to the guild-ops repo causing the service to either hang on interactive input for the checkUpdate call? Is this being left to the operator to define/export the SKIP_UPDATE for any service they wish to start normally after the repo updates?
@Scitz0 requesting a final review prior to squash & merge as code has changed.
Thank you.
@Scitz0 requesting a final review prior to squash & merge as code has changed.
Thank you.
Looks good, only thing left is the move to CNODE_VNAME like in cnode.sh as this is already set in env, so no need to create this variable.
EDIT Also BATCH_AUTO_UPDATE, please add as user configurable variable at the top like in topologyUpdater.sh
@Scitz0 requesting a final review prior to squash & merge as code has changed. Thank you.
Looks good, only thing left is the move to CNODE_VNAME like in cnode.sh as this is already set in env, so no need to create this variable.
EDIT Also BATCH_AUTO_UPDATE, please add as user configurable variable at the top like in topologyUpdater.sh
:+1: Will do.