netdata-cloud icon indicating copy to clipboard operation
netdata-cloud copied to clipboard

[Feat]: Display banner warning and chart message if a node has an issue with its clock

Open hugovalente-pm opened this issue 3 years ago • 3 comments

Problem

When a node has an incorrect clock set this may cause issues when retrieving data for charts (to confirm if this has more impacts). The issues happen when from Netdata Cloud requests are triggered using relative timestamps.

Description

This is an example of an issue reported by a user (discord thread) where the Agent clock wasn't correct and no data was being displayed on Cloud image

After the node's clock was fixed data was being displayed correctly on Netdata Cloud.

If on Netdata Cloud we are able to identify to the users that their node(s) have issues with the clock it can help identify why some data isn't displayed. The charts impacted by this should also flag this to the user in some way.

Importance

really want

Value proposition

  1. Nodes with an issue on the clock will be identified on Netdata Cloud
  2. Charts will be able to present to the user that no data is visible due to some issue on the node's clock

Proposed implementation

UI/UX:

  • For the identified nodes with issues on the clock we could have a top banner appearing on Netdata Cloud saying that X nodes have an issue with their clocks and this may have impacts on retrieving data for the charts (banner similar to the one used for the new architecture migration request to upgrade nodes)
  • On the charts we should be able to display that no data is presented for the period, even though the had replied with some data

Agent/Netdata Cloud:

  • Agent should publish some info on NodeInfo message that would allow the check from Netdata Cloud side
  • Netdata Cloud BE should check, when NodeInfo messages are processed, if node has some issue with the clock

hugovalente-pm avatar Jun 22 '22 11:06 hugovalente-pm

based on an issue @MrZammler helped troubleshoot the idea to identify these issues directly in Netdata Cloud and warn the users about them come up

@amalkov @MrZammler @papazach @novykh the ticket describes the overall idea, we need to iron out details, but please check and give some feedback

hugovalente-pm avatar Jun 22 '22 11:06 hugovalente-pm

Hi @hugovalente-pm !

As a first step, there is an indication in UpdateNodeInfo message, a field called updated_at. This is the current time when the message is sent, it could might be used to deduce a diff in time between agent/cloud.

MrZammler avatar Jun 28 '22 12:06 MrZammler

@MrZammler thanks, sounds good, just extra check... is it always in UTC?

amalkov avatar Jun 29 '22 20:06 amalkov