rotonda icon indicating copy to clipboard operation
rotonda copied to clipboard

Clean up metrics

Open DRiKE opened this issue 1 year ago • 1 comments

With significant changes in many of the different components, some of the existing metrics do not make sense anymore. We probably also need a couple of new metrics. This issue tracks what we had in 0.1 and what we want for 0.2.

Some notes on the (crossed out) below:

  • with the prefix store doing the UserMergeUpdate now, tracking the merge_update related metrics is not possible/sensible from Rotonda itself anymore. We can't distinguish an update vs an insert. A possible new metric to track here is the number of compare-and-swap attempts the store had to perform, though there might be other (more) insightful numbers.
  • all the task_ related metrics (managed by TokioTaskMetrics) were not actually being tracked;
  • eventually all BMP state machine related code will move into routecore, we will revisit all those metrics when that happens;
  • we might want some diagnostics for the new ingress::Register. Moreover, the metrics output probably needs to consult the Register to populate the labels for the endpoint output.

0.1 metrics (taken from the /status/ endpoint), crossed out means we do not include it in v0.2:

  • version: rotonda/0.1.1-dev
  • bgp-in num_updates: 3
  • bgp-in num_dropped_updates: 0
  • bgp-in last_update: 2024-07-30 10:40:47.909446464 UTC
  • bgp-in since_last_update: 41
  • bgp-in update_set_size: 3
  • bgp-in bgp_tcp_in_listener_bound_count: 1
  • bgp-in bgp_tcp_in_connection_accepted_count: 1
  • bgp-in bgp_tcp_in_connection_lost_count: 1
  • bgp-in bgp_tcp_in_disconnect_count: 0
  • bmp-in num_updates: 1
  • bmp-in num_dropped_updates: 0
  • bmp-in last_update: 2024-07-30 10:40:47.908768858 UTC
  • bmp-in since_last_update: 41
  • bmp-in update_set_size: 0
  • bmp-in bmp_tcp_in_listener_bound_count: 1
  • bmp-in bmp_tcp_in_connection_accepted_count: 1
  • bmp-in bmp_tcp_in_connection_lost_count: 1
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Route Monitoring: 0
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Statistics Report: 0
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Peer Down Notification: 0
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Peer Up Notification: 0
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Initiation Message: 1
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Termination Message: 0
  • bmp-in bmp_tcp_in_num_bmp_messages_received router=unknown msg_type=Route Mirroring Message: 0
  • bmp-in bmp_tcp_in_num_receive_io_errors router=unknown: 0
  • bmp-in bmp_tcp_in_num_bmp_messages_processed router=unknown: 1
  • bmp-in bmp_in_num_invalid_bmp_messages router=unknown: 0
  • bmp-in bmp_num_connected_routers: 2
  • ~~bmp-in bmp_state_machine_state router=cavefish: Dumping~~
  • bmp-in bmp_state_num_received_prefixes router=cavefish: 0
  • bmp-in bmp_state_num_stored_prefixes router=cavefish: 0
  • bmp-in bmp_state_num_bmp_route_monitoring_msgs_with_unknown_peer router=cavefish: 0
  • bmp-in bmp_state_num_bgp_updates_reparsed_due_to_incorrect_header_flags router=cavefish: 0
  • bmp-in bmp_state_num_unprocessable_bmp_messages router=cavefish: 1
  • bmp-in bmp_state_num_announcements router=cavefish: 0
  • bmp-in bmp_state_num_withdrawals router=cavefish: 0
  • bmp-in bmp_state_num_up_peers router=cavefish: 1
  • ~~bmp-in bmp_state_num_up_peers_eor_capable router=cavefish: 0~~
  • bmp-in bmp_state_num_up_peers_with_pending_eors router=cavefish: 0
  • ~~bmp-in task_instrumented_count: 0~~
  • ~~bmp-in task_dropped_count: 0~~
  • ~~bmp-in task_first_poll_count: 0~~
  • ~~bmp-in task_total_first_poll_delay: 0~~
  • ~~bmp-in task_total_idled_count: 0~~
  • ~~bmp-in task_total_idle_duration: 0~~
  • ~~bmp-in task_total_scheduled_count: 0~~
  • ~~bmp-in task_total_scheduled_duration: 0~~
  • ~~bmp-in task_total_poll_count: 0~~
  • ~~bmp-in task_total_poll_duration: 0~~
  • ~~bmp-in task_total_fast_poll_count: 0~~
  • ~~bmp-in task_total_fast_poll_duration: 0~~
  • ~~bmp-in task_total_slow_poll_count: 0~~
  • ~~bmp-in task_total_slow_poll_duration: 0~~
  • ~~bmp-in task_total_short_delay_count: 0~~
  • ~~bmp-in task_total_long_delay_count: 0~~
  • ~~bmp-in task_total_short_delay_duration: 0~~
  • ~~bmp-in task_total_long_delay_duration: 0~~
  • ~~rib-in-post task_instrumented_count: 0~~
  • ~~rib-in-post task_dropped_count: 0~~
  • ~~rib-in-post task_first_poll_count: 0~~
  • ~~rib-in-post task_total_first_poll_delay: 0~~
  • ~~rib-in-post task_total_idled_count: 0~~
  • ~~rib-in-post task_total_idle_duration: 0~~
  • ~~rib-in-post task_total_scheduled_count: 0~~
  • ~~rib-in-post task_total_scheduled_duration: 0~~
  • ~~rib-in-post task_total_poll_count: 0~~
  • ~~rib-in-post task_total_poll_duration: 0~~
  • ~~rib-in-post task_total_fast_poll_count: 0~~
  • ~~rib-in-post task_total_fast_poll_duration: 0~~
  • ~~rib-in-post task_total_slow_poll_count: 0~~
  • ~~rib-in-post task_total_slow_poll_duration: 0~~
  • ~~rib-in-post task_total_short_delay_count: 0~~
  • ~~rib-in-post task_total_long_delay_count: 0~~
  • ~~rib-in-post task_total_short_delay_duration: 0~~
  • ~~rib-in-post task_total_long_delay_duration: 0~~
  • rib-in-post num_updates: 3
  • rib-in-post num_dropped_updates: 3
  • rib-in-post last_update: 2024-07-30 10:40:47.908753018 UTC
  • rib-in-post since_last_update: 41
  • rib-in-post update_set_size: 3
  • rib-in-post rib_unit_num_unique_prefixes: 6
  • rib-in-post rib_unit_num_items: 0
  • rib-in-post rib_unit_num_insert_retries: 2111
  • rib-in-post rib_unit_num_insert_hard_failures: 0
  • rib-in-post rib_unit_num_routes_announced: 0
  • rib-in-post rib_unit_num_modified_route_announcements: 6
  • rib-in-post rib_unit_num_routes_withdrawn: 0
  • rib-in-post rib_unit_num_route_withdrawals_without_announcements: 0
  • rib-in-post rib_unit_insert_duration: 630
  • ~~rib-in-post rib_unit_update_duration: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration le=1: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration le=10: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration le=100: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration le=1000: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration le=10000: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration le=+Inf: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration: 0~~
  • ~~rib-in-post rib_merge_update_withdrawal_duration: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration le=1: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration le=10: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration le=100: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration le=1000: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration le=10000: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration le=+Inf: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration: 0~~
  • ~~rib-in-post rib_merge_update_announce_duration: 0~~
  • ~~rib-in-pre task_instrumented_count: 0~~
  • ~~rib-in-pre task_dropped_count: 0~~
  • ~~rib-in-pre task_first_poll_count: 0~~
  • ~~rib-in-pre task_total_first_poll_delay: 0~~
  • ~~rib-in-pre task_total_idled_count: 0~~
  • ~~rib-in-pre task_total_idle_duration: 0~~
  • ~~rib-in-pre task_total_scheduled_count: 0~~
  • ~~rib-in-pre task_total_scheduled_duration: 0~~
  • ~~rib-in-pre task_total_poll_count: 0~~
  • ~~rib-in-pre task_total_poll_duration: 0~~
  • ~~rib-in-pre task_total_fast_poll_count: 0~~
  • ~~rib-in-pre task_total_fast_poll_duration: 0~~
  • ~~rib-in-pre task_total_slow_poll_count: 0~~
  • ~~rib-in-pre task_total_slow_poll_duration: 0~~
  • ~~rib-in-pre task_total_short_delay_count: 0~~
  • ~~rib-in-pre task_total_long_delay_count: 0~~
  • ~~rib-in-pre task_total_short_delay_duration: 0~~
  • ~~rib-in-pre task_total_long_delay_duration: 0~~
  • rib-in-pre num_updates: 3
  • rib-in-pre num_dropped_updates: 3
  • rib-in-pre last_update: 2024-07-30 10:40:47.908767626 UTC
  • rib-in-pre since_last_update: 41
  • rib-in-pre update_set_size: 3
  • rib-in-pre rib_unit_num_unique_prefixes: 6
  • rib-in-pre rib_unit_num_items: 0
  • rib-in-pre rib_unit_num_insert_retries: 2111
  • rib-in-pre rib_unit_num_insert_hard_failures: 0
  • rib-in-pre rib_unit_num_routes_announced: 0
  • rib-in-pre rib_unit_num_modified_route_announcements: 6
  • rib-in-pre rib_unit_num_routes_withdrawn: 0
  • rib-in-pre rib_unit_num_route_withdrawals_without_announcements: 0
  • rib-in-pre rib_unit_insert_duration: 586
  • ~~rib-in-pre rib_unit_update_duration: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration le=1: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration le=10: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration le=100: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration le=1000: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration le=10000: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration le=+Inf: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration: 0~~
  • ~~rib-in-pre rib_merge_update_withdrawal_duration: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration le=1: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration le=10: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration le=100: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration le=1000: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration le=10000: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration le=+Inf: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration: 0~~
  • ~~rib-in-pre rib_merge_update_announce_duration: 0~~
  • metric_assemble_duration: 0

New metrics we want in 0.2:

  • [ ] rib_total_prefixes (v4 vs v6?)
  • [ ] rib_total_muis (unique peers)

DRiKE avatar Aug 20 '24 13:08 DRiKE

We'll postpone this to after 0.2 because of other pending refactors that might affect these parts.

DRiKE avatar Nov 21 '24 12:11 DRiKE