metacatui
metacatui copied to clipboard
Member Node Status Dashboard
This is a low priority and needs discussion.
Develop audit dashboard (Low priority) Design and develop a dashboard UI that displays the audit status as of last run, ideally with some extra detail for objects in invalid state.
This could also include statuses of object synchronization, replication, indexing etc.
This needs some terminology changes but it is an idea of what we are going for
ESS-DIVE Data Audit Report
Includes published data objects created between 2018-01-01 00:00:00 and 2019-08-16 00:00:00.
This audit report is produced on a monthly basis, or as-needed, to report the overall health of the ESS-DIVE repository. The source of data for this report is the output of continuous processes that replicate and independently verify the intactness of data objects at each ESS-DIVE repository instance.
Legend:
HEALTHY: Verified intact on primary and all replica instances.
DEGRADED: Verified intact on primary, but verified corrupt (or unable to verify) on at least one replica instance; expected to self-heal under normal conditions.
AT RISK: Verified corrupt (or unable to verify) on primary and/or all replica instances; intervention needed to restore data objects from offline (backup) copies.
********************************************************
Summary:
Data objects with status HEALTHY: 5851 (96.39)
Data objects with status DEGRADED: 14 (0.23)
Data objects with status AT RISK: 205 (3.38)
----------------------------------------------------
Total data objects: 6070 100%
********************************************************
Detail on DEGRADED data objects:
Replication partially complete (queued) (14)
Replication partially complete (failed) (0)
Checksum validation partially complete (queued) (0)
Checksum validation partially complete (failed) (0)
********************************************************
Detail on AT RISK data objects::
Replication incomplete; NO replicas exist (queued) (21)
Replication incomplete; NO replicas exist (failed) (0)
Checksum validation incomplete; NO replicas verified (queued) (184)
Checksum validation incomplete; NO replicas verified (failed) (0)
Checksum validation incomplete; source corrupt (0)
Auditing data incomplete; NO replicas verified (0)
********************************************************
Val and Cory and I discussed this a bit, and decided we need to consider a few components:
- A server side API to provide aggregated statistics, and the
metrics-servicemay be appropriate. Consider folding in Hesham's report code into themetrics-serviceas another type of metric to be queried. - Wireframes of a MetacatUI view that would probably be an addition to the MN profile page, available to MN operator subjects.
For reference, some related info that can be grabbed with existing monitoring:
https://monitor.dataone.org/status/ https://cn.dataone.org/processing_metrics.txt
The second is really a proof of concept but shows some valuable queue information.