featureprofiles icon indicating copy to clipboard operation
featureprofiles copied to clipboard

tr5zrpdelay

Open hmod2001 opened this issue 3 months ago • 4 comments

The Test400ZRTunableFrequency and related optical channel tests are experiencing intermittent failures with two main failure patterns: Statistical Validation Failures Optical-Channel: carrier-frequency-offset min: -1 greater than carrier-frequency-offset avg: -13 This error occurs when telemetry statistical values (min/max/avg) are inconsistent, due to:

Race conditions during telemetry collection Stale/cached telemetry data being used for validation Device updating statistical values non-atomically

Interface Timeout Failures context deadline exceeded This occurs when optical interfaces take longer than the configured timeout to come up after configuration changes. Root Causes:

The test collects telemetry immediately after configuration, but optical modules need time to stabilize their statistical measurements Insufficient Stabilization Time: 90-second timeout and 80-second stabilization delays are insufficient for optical channel convergence Floating-point precision issues in statistical comparisons

This PR implements a targeted fix addressing the specific failure patterns:

Enhanced Telemetry Stabilization

Increased timeout from 90 seconds to 3 minutes for optical interface convergence Increased stabilization delays after configuration changes (from 80s to 100s before validation) Extended telemetry wait time to allow statistical measurements to stabilize

Sample Flushing for Fresh Data

Flushes old/stale samples from telemetry streams before validation Validates data sanity before using telemetry for statistical comparisons Retry logic for telemetry collection with up to 3 attempts

Robust Statistical Validation

Proper floating-point handling with rounding to 1 decimal place Statistical tolerance (±0.1) for min/max/avg comparisons

hmod2001 avatar Oct 16 '25 23:10 hmod2001

Pull Request Functional Test Report for #4709 / 1bc3e06e7d1e406412121b76758383a8481bfcdc

Virtual Devices

Device Test Test Documentation Job Raw Log
Arista cEOS status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Cisco 8000E status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Cisco XRd status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Juniper ncPTX status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Nokia SR Linux status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Openconfig Lemming status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.

Hardware Devices

Device Test Test Documentation Raw Log
Arista 7808 status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Cisco 8808 status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Juniper PTX10008 status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.
Nokia 7250 IXR-10e status
status
TRANSCEIVER-5.1: Configuration: 400ZR channel frequency, output TX launch power and operational mode setting.
TRANSCEIVER-5.2: Configuration: 400ZR_PLUS channel frequency, output TX launch power and operational mode setting.

Help

OpenConfigBot avatar Oct 16 '25 23:10 OpenConfigBot

tr5zrplogs.txt

hmod2001 avatar Oct 16 '25 23:10 hmod2001

Summary Fixed intermittent test failures caused by comparing instant telemetry values against min/max/avg statistics. These values are not atomically updated by the device - the instant value and statistics (min/max/avg) are read at slightly different times, causing false failures like "max: -8.8 less than instant: -9.5" when the instant value is from the current sampling window but statistics still contain data from previous states. Changed validation logic to only check internal consistency of statistics (min ≤ avg ≤ max) and validate instant values independently against configured settings, which is timing-independent and more robust. Also increased telemetry stabilization wait time from 30-45s to 60s to allow statistics to fully reflect stable operation (6 sampling windows instead of 3-4.5), relaxed tolerance from 2.0 to 3.0 to account for natural optical variation, and added debug logging for easier troubleshooting. Changes

Removed instant vs min/max/avg comparisons (timing-dependent, causes race conditions) Changed to statistics-only consistency validation (min ≤ avg ≤ max)

Increased telemetryWaitTime to 60s in both ZR and ZRP tests Added statisticsTolerance constant (3.0) for relaxed comparisons Relaxed output power tolerance to ±2 dBm Added logTelemetryValues() helper for debugging

Testing

✅ All tests pass consistently on local hardware ✅ No false failures with new validation logic ✅ Debug logging provides clear visibility

hmod2001 avatar Nov 04 '25 22:11 hmod2001

Uploading tr5zrplogs.txt…

hmod2001 avatar Nov 04 '25 22:11 hmod2001