go.d.plugin icon indicating copy to clipboard operation
go.d.plugin copied to clipboard

Cassandra module

Open thiagoftsm opened this issue 3 years ago • 11 comments

This PR implements what was defined for issue https://github.com/netdata/netdata/issues/13700

thiagoftsm avatar Oct 06 '22 01:10 thiagoftsm

@thiagoftsm I see you are collecting only summary metrics, why?

ilyam8 avatar Oct 11 '22 12:10 ilyam8

@thiagoftsm I see you are collecting only summary metrics, why?

To simplify transition from users that are using other software to netdata. Other companies are also collecting summary;

thiagoftsm avatar Oct 11 '22 19:10 thiagoftsm

To simplify transition from users that are using other software to netdata. Other companies are also collecting summary;

I very doubt that. I am pretty sure they do collect not only summary metrics.

Other companies are also collecting summary;

Can you give a link to those company's collector source code?

cc @shyamvalsan

ilyam8 avatar Oct 12 '22 07:10 ilyam8

@ilyam8 the actual description https://github.com/netdata/netdata/issues/13700 conduct us to the summary metrics

thiagoftsm avatar Oct 12 '22 11:10 thiagoftsm

@thiagoftsm you get summary metrics on the Cloud overview page.

ilyam8 avatar Oct 12 '22 11:10 ilyam8

@thiagoftsm my point is that we would need to delete the summary metrics if we add per-instance metrics, so why add the summary in the first place?

ilyam8 avatar Oct 12 '22 11:10 ilyam8

@thiagoftsm @ilyam8 I am not sure I follow this discussion - what is the concern here? what exactly do you mean by "summary" metrics in this context?

shyamvalsan avatar Oct 13 '22 12:10 shyamvalsan

Let's take for example this metric

org_apache_cassandra_metrics_table_count
# HELP org_apache_cassandra_metrics_table_count Attribute exposed for management org.apache.cassandra.metrics:name=CompactionBytesWritten,type=Table,attribute=Count
# TYPE org_apache_cassandra_metrics_table_count untyped
org_apache_cassandra_metrics_table_count{keyspace="system_traces",scope="events",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="view_builds_in_progress",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="parent_repair_history",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="repair_history",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="available_ranges",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="built_views",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="indexes",name="LiveDiskSpaceUsed",} 17084.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="role_members",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="available_ranges_v2",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="repair_history",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="role_permissions",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="tables",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="repairs",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="batches",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="role_members",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_traces",scope="events",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="paxos",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="role_permissions",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peer_events_v2",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="view_builds_in_progress",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="types",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="role_members",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="available_ranges_v2",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="transferred_ranges_v2",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peer_events",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_traces",scope="sessions",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="views",name="TotalDiskSpaceUsed",} 16961.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="prepared_statements",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="paxos",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="table_estimates",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="columns",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="triggers",name="TotalDiskSpaceUsed",} 17084.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="functions",name="TotalDiskSpaceUsed",} 17342.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="parent_repair_history",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="IndexInfo",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="indexes",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peer_events_v2",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="prepared_statements",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="network_permissions",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="local",name="LiveDiskSpaceUsed",} 8643.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="size_estimates",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="batches",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="transferred_ranges",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="functions",name="LiveDiskSpaceUsed",} 17342.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peers",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="tables",name="LiveDiskSpaceUsed",} 29641.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="role_permissions",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="triggers",name="LiveDiskSpaceUsed",} 17084.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="repairs",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="functions",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="view_build_status",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="IndexInfo",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="indexes",name="TotalDiskSpaceUsed",} 17084.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="paxos",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="repairs",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="sstable_activity",name="TotalDiskSpaceUsed",} 13230.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="transferred_ranges",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_traces",scope="events",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peer_events",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peers_v2",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="table_estimates",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="resource_role_permissons_index",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="size_estimates",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="view_build_status",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="parent_repair_history",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peer_events",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peer_events_v2",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="transferred_ranges",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="resource_role_permissons_index",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="aggregates",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="prepared_statements",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="columns",name="TotalDiskSpaceUsed",} 35012.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peers_v2",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="transferred_ranges_v2",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="repair_history",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="batches",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="available_ranges",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="built_views",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="table_estimates",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peers",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_distributed",scope="view_build_status",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="views",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peers_v2",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="types",name="LiveDiskSpaceUsed",} 16961.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="local",name="CompactionBytesWritten",} 679.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="roles",name="LiveDiskSpaceUsed",} 5181.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="tables",name="TotalDiskSpaceUsed",} 29641.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="size_estimates",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="local",name="TotalDiskSpaceUsed",} 8643.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="sstable_activity",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="available_ranges_v2",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="resource_role_permissons_index",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="view_builds_in_progress",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="keyspaces",name="TotalDiskSpaceUsed",} 17979.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="keyspaces",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="sstable_activity",name="LiveDiskSpaceUsed",} 13230.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="IndexInfo",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="dropped_columns",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="aggregates",name="TotalDiskSpaceUsed",} 17342.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="columns",name="LiveDiskSpaceUsed",} 35012.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="built_views",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="peers",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_traces",scope="sessions",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="dropped_columns",name="TotalDiskSpaceUsed",} 17822.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="keyspaces",name="LiveDiskSpaceUsed",} 17979.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="compaction_history",name="LiveDiskSpaceUsed",} 27428.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="roles",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="types",name="TotalDiskSpaceUsed",} 16961.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="network_permissions",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="transferred_ranges_v2",name="TotalDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="roles",name="TotalDiskSpaceUsed",} 5181.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="aggregates",name="LiveDiskSpaceUsed",} 17342.0
org_apache_cassandra_metrics_table_count{keyspace="system_traces",scope="sessions",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="compaction_history",name="CompactionBytesWritten",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="compaction_history",name="TotalDiskSpaceUsed",} 27428.0
org_apache_cassandra_metrics_table_count{keyspace="system",scope="available_ranges",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_auth",scope="network_permissions",name="LiveDiskSpaceUsed",} 0.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="dropped_columns",name="LiveDiskSpaceUsed",} 17822.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="views",name="LiveDiskSpaceUsed",} 16961.0
org_apache_cassandra_metrics_table_count{keyspace="system_schema",scope="triggers",name="CompactionBytesWritten",} 0.0

Instead of creating a chart per keyspace, and scope (with keypsace, scope labels) we create one chart with summary metrics (that would be provided by ND Cloud overview functionality).

ilyam8 avatar Oct 13 '22 12:10 ilyam8

OK understood @ilyam8

But would creating charts per keyspace and per scope lead us into the same problem we see on PostgreSQL when there are thousands of tables and indexes? Personally, if there's a way we can capture more info and not suffer performance implications I am all for it.

I think the number of keyspaces is only limited by available memory - but based on the feedbackfrom the user & community the recommended number of keyspaces seems to be <200 or even 150. No idea for scope though.

Also @thiagoftsm there are potentially a LOT of metrics we can collect here, but I think to start with we should limit ourselves to what is mentioned in https://github.com/netdata/netdata/issues/13700 if there is user demand for more we can address it later. The important thing is to get the collector with basic metrics available out to the users as soon as we can.

shyamvalsan avatar Oct 13 '22 12:10 shyamvalsan

ah, if collecting only summary is what you think will do now, then ok.

ilyam8 avatar Oct 13 '22 13:10 ilyam8

Now that we agreed the metrics, I fixed the issues we had with last commits.

thiagoftsm avatar Oct 14 '22 02:10 thiagoftsm

@thiagoftsm I am going to merge the PR and add adjustments (if needed) after.

And for some reason, you haven't added any readme 🤷‍♂️

ilyam8 avatar Oct 17 '22 09:10 ilyam8

Thank you for your help @ilyam8 ! :)

thiagoftsm avatar Oct 17 '22 11:10 thiagoftsm

@thiagoftsm I think all the metrics - except one (exceptions - requests for which Cassandra encountered an error) have been added.

Some of the charts need to be organized into sections, but that should be a minor PR.

Let's have a chat about this today.

shyamvalsan avatar Oct 17 '22 11:10 shyamvalsan

Hello @shyamvalsan ,

I will take a look, but I remember this Exception I could not find it in documentation, do you remember the source of this information?

Which sections? Are you talking about the same sections in the issue? If yes, we only need to organize families, but this will create a third level organization in our dashboard.

Best regards!

thiagoftsm avatar Oct 17 '22 11:10 thiagoftsm

@thiagoftsm

Exceptions = StorageExceptions I believe. It is the number of internal exceptions caught.

By sections I mean following the organization mentioned in https://github.com/netdata/netdata/issues/13700 the throughput, latency and cache sections are good. But the rest of the charts need to be clubbed together under the following sections: Disk usage, Garbage collection, Errors as shown below. This does NOT require a third level.

  • Disk usage
    • Load (Disk space used on a node in bytes)
    • Total disk space used (Disk space used by column family, in btyes)
    • Compaction tasks completed (Total count of completed compaction tasks)
    • Compaction tasks in queue (Total count of pending compaction tasks in queue)
  • Garbage collection
    • ParNew count (Number of young-generation collection)
    • ParNew time (Elpased time of young-generation collection in milliseconds)
    • ConcurrentMarkSweep count (Number of old-generation collection)
    • ConcurrentMarkSweep time (Elapsed time of old-generation collection in milliseconds)
  • Errors
    • Exceptions (Requests for which Cassandra encountered an error)
    • Timeout exceptions (Requests not unacknowledged within timeout window)
    • Unavailable exceptions (Requests for which required number of nodes was unavailable)
    • Pending tasks (Tasks in queue awaiting a thread for processing)
    • Blocked tasks (Tasks that have not yet ben queued for processing)

shyamvalsan avatar Oct 17 '22 12:10 shyamvalsan

btw it is Cassandra not cassandra 😄

image

shyamvalsan avatar Oct 17 '22 12:10 shyamvalsan

All right, I am working with Ilya to address everything. :)

thiagoftsm avatar Oct 17 '22 19:10 thiagoftsm