ecchronos icon indicating copy to clipboard operation
ecchronos copied to clipboard

Investigate debug logs of ecchronos

Open masokol opened this issue 1 year ago • 6 comments

Investigate debug logs of ecchronos to see how much is actually logged. Maybe some stuff should be throttled or removed/moved to trace.

masokol avatar Aug 08 '23 12:08 masokol

There's a lot of irrelevant logs on debug lvl. We should move them to trace instead.

masokol avatar Aug 09 '23 06:08 masokol

I will take this

VictorCavichioli avatar Oct 05 '23 11:10 VictorCavichioli

There are a few times debug is LOGed. Which should be kept and/or changed to info? A DoD needs to be defined for this issue

epkdaek@elx721027t9:~/github/ecchronos$ git grep "LOG.debug" application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/ECChronosInternals.java: LOG.debug("Table {} last repaired at {}", tableReference, lastRepairedAt); application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/ECChronosInternals.java: LOG.debug("Table {} remaining repair time {}", tableReference, remainingRepairTime); application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/config/ConfigRefresher.java: LOG.debug("Watching for changes in {}", absoluteFilePath); application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/config/ConfigRefresher.java: LOG.debug("Watch service has been closed"); application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/config/ConfigRefresher.java: LOG.debug("Received event for {}/{}", baseDirectory, file); application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/spring/CassandraHealthIndicator.java: LOG.debug("Unable to connect over JMX", e); application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/spring/CassandraHealthIndicator.java: LOG.debug("Unable to connect over CQL", e); connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalJmxConnectionProvider.java: LOG.debug("Connecting JMX through {}, credentials: {}, tls: {}", jmxUrl, authEnabled, tlsEnabled); connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalJmxConnectionProvider.java: LOG.debug("Connected JMX for {}", jmxUrl); connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalNativeConnectionProvider.java: LOG.debug("Connecting to {}({}), local data center: {}", contactEndPoint, initialContact.getHostId(), connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalNativeConnectionProvider.java: LOG.debug("Driver configuration: {}", driverConfigLoader.getInitialConfig().getDefaultProfile().entrySet()); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/CASLock.java: LOG.debug("Locally highest priority ({}) is higher than current ({}), will not remove", core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/HostStatesImpl.java: LOG.debug("Host {} marked as up", host); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/HostStatesImpl.java: LOG.debug("Host {} marked as down", host); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/LockCache.java: LOG.debug("Encountered cached locking failure, throwing exception", e); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/TableStorageStatesImpl.java: LOG.debug("{} -> {}", tableReference, diskSpaceUsed); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/DefaultRepairConfigurationProvider.java: LOG.debug("{} switched state to UP.", node); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/DefaultRepairConfigurationProvider.java: LOG.debug("{} switched state to DOWN.", node); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/DefaultRepairConfigurationProvider.java: LOG.debug("Session during setupConfiguration call was null."); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/IncrementalRepairJob.java: LOG.debug("{} - last successful run: {}", this, myLastSuccessfulRun); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/IncrementalRepairTask.java: LOG.debug("{} for range {}", repairStatus, range); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairGroup.java: LOG.debug("Table {} running repair job {}", myTableReference, myReplicaRepairGroup); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairGroup.java: LOG.debug("", e); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairLockFactoryImpl.java: LOG.debug("Found cached locking failure for {}, rethrowing", repairResource, e); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairLockFactoryImpl.java: LOG.debug("{} - Unable to get repair resource lock '{}', releasing previously acquired locks - {}", core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairLockFactoryImpl.java: LOG.debug("Lock ({} in datacenter {}) got error {}", resource, dataCenter, e.getMessage()); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairTask.java: LOG.debug("{} completed successfully", this); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairTask.java: LOG.debug("Notification {}", notification.toString()); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairTask.java: LOG.debug("Unknown JMXConnectionNotification type: {}", notification.getType()); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/VnodeOnDemandRepairJob.java: LOG.debug("Total tokens for this job are 0"); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/VnodeRepairTask.java: LOG.debug("Unknown ranges: {}", unknownRanges); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/VnodeRepairTask.java: LOG.debug("Completed ranges: {}", completedRanges); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/EccRepairHistory.java: LOG.debug("Token range {} was not found in metadata", tokenRange); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/RepairStateImpl.java: LOG.debug("Table {} fully repaired at {}, next repair at/after {}", myTableReference, core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/RepairStateImpl.java: LOG.debug("Table {} partially repaired at {}, next repair at/after {}", myTableReference, core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("No last repaired at found for {}, iterating over all repair entries", tableReference); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("Table {} snapshot created at {}, iterating repair entries until that time", tableReference, core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("Ignoring entry {}, repair was not successful", repairEntry); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("Ignoring entry {}, replicas {} not matching participants", repairEntry, nodes); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Run policy {} added", runPolicy); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Run policy {} removed", runPolicy); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Job {} rejected for {} ms by {}", job, nextRun, runPolicy); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Trying to acquire lock for {}", task); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduledJobQueue.java: LOG.debug("Removing job: {}", job); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduledJobQueue.java: LOG.debug("Adding job: {}, Priority: {}", job, job.getPriority()); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduledJobQueue.java: LOG.debug("Retrieving job: {}, Priority: {}", job, job.getPriority()); core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/utils/ReplicatedTableProviderImpl.java: LOG.debug("Keyspace {} not replicated by local node, ignoring.", keyspace); epkdaek@elx721027t9:~/github/ecchronos$

DanielwEriksson avatar Apr 25 '24 12:04 DanielwEriksson

This is the suggestion

epkdaek@elx721027t9:~/github/ecchronos$ git grep "LOG.debug" INFO: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/ECChronosInternals.java: LOG.debug("Table {} last repaired at {}", tableReference, lastRepairedAt); INFO: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/ECChronosInternals.java: LOG.debug("Table {} remaining repair time {}", tableReference, remainingRepairTime); INFO: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/config/ConfigRefresher.java: LOG.debug("Watching for changes in {}", absoluteFilePath); INFO: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/config/ConfigRefresher.java: LOG.debug("Watch service has been closed"); DEBUG: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/config/ConfigRefresher.java: LOG.debug("Received event for {}/{}", baseDirectory, file); DEBUG: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/spring/CassandraHealthIndicator.java: LOG.debug("Unable to connect over JMX", e); DEBUG: application/src/main/java/com/ericsson/bss/cassandra/ecchronos/application/spring/CassandraHealthIndicator.java: LOG.debug("Unable to connect over CQL", e); INFO: connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalJmxConnectionProvider.java: LOG.debug("Connecting JMX through {}, credentials: {}, tls: {}", jmxUrl, authEnabled, tlsEnabled); INFO: connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalJmxConnectionProvider.java: LOG.debug("Connected JMX for {}", jmxUrl); INFO: connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalNativeConnectionProvider.java: LOG.debug("Connecting to {}({}), local data center: {}", contactEndPoint, initialContact.getHostId(), INFO: connection.impl/src/main/java/com/ericsson/bss/cassandra/ecchronos/connection/impl/LocalNativeConnectionProvider.java: LOG.debug("Driver configuration: {}", driverConfigLoader.getInitialConfig().getDefaultProfile().entrySet()); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/CASLock.java: LOG.debug("Locally highest priority ({}) is higher than current ({}), will not remove", INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/HostStatesImpl.java: LOG.debug("Host {} marked as up", host); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/HostStatesImpl.java: LOG.debug("Host {} marked as down", host); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/LockCache.java: LOG.debug("Encountered cached locking failure, throwing exception", e); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/TableStorageStatesImpl.java: LOG.debug("{} -> {}", tableReference, diskSpaceUsed); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/DefaultRepairConfigurationProvider.java: LOG.debug("{} switched state to UP.", node); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/DefaultRepairConfigurationProvider.java: LOG.debug("{} switched state to DOWN.", node); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/DefaultRepairConfigurationProvider.java: LOG.debug("Session during setupConfiguration call was null."); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/IncrementalRepairJob.java: LOG.debug("{} - last successful run: {}", this, myLastSuccessfulRun); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/IncrementalRepairTask.java: LOG.debug("{} for range {}", repairStatus, range); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairGroup.java: LOG.debug("Table {} running repair job {}", myTableReference, myReplicaRepairGroup); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairGroup.java: LOG.debug("", e); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairLockFactoryImpl.java: LOG.debug("Found cached locking failure for {}, rethrowing", repairResource, e); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairLockFactoryImpl.java: LOG.debug("{} - Unable to get repair resource lock '{}', releasing previously acquired locks - {}", DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairLockFactoryImpl.java: LOG.debug("Lock ({} in datacenter {}) got error {}", resource, dataCenter, e.getMessage()); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairTask.java: LOG.debug("{} completed successfully", this); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairTask.java: LOG.debug("Notification {}", notification.toString()); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/RepairTask.java: LOG.debug("Unknown JMXConnectionNotification type: {}", notification.getType()); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/VnodeOnDemandRepairJob.java: LOG.debug("Total tokens for this job are 0"); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/VnodeRepairTask.java: LOG.debug("Unknown ranges: {}", unknownRanges); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/VnodeRepairTask.java: LOG.debug("Completed ranges: {}", completedRanges); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/EccRepairHistory.java: LOG.debug("Token range {} was not found in metadata", tokenRange); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/RepairStateImpl.java: LOG.debug("Table {} fully repaired at {}, next repair at/after {}", myTableReference, DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/RepairStateImpl.java: LOG.debug("Table {} partially repaired at {}, next repair at/after {}", myTableReference, INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("No last repaired at found for {}, iterating over all repair entries", tableReference); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("Table {} snapshot created at {}, iterating repair entries until that time", tableReference, DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("Ignoring entry {}, repair was not successful", repairEntry); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/repair/state/VnodeRepairStateFactoryImpl.java: LOG.debug("Ignoring entry {}, replicas {} not matching participants", repairEntry, nodes); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Run policy {} added", runPolicy); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Run policy {} removed", runPolicy); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Job {} rejected for {} ms by {}", job, nextRun, runPolicy); DEBUG: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduleManagerImpl.java: LOG.debug("Trying to acquire lock for {}", task); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduledJobQueue.java: LOG.debug("Removing job: {}", job); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduledJobQueue.java: LOG.debug("Adding job: {}, Priority: {}", job, job.getPriority()); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/scheduling/ScheduledJobQueue.java: LOG.debug("Retrieving job: {}, Priority: {}", job, job.getPriority()); INFO: core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/utils/ReplicatedTableProviderImpl.java: LOG.debug("Keyspace {} not replicated by local node, ignoring.", keyspace); epkdaek@elx721027t9:~/github/ecchronos$

DanielwEriksson avatar Apr 26 '24 07:04 DanielwEriksson

Some might vene be changed to error?

LOG.debug("Unable to connect over JMX", e); and LOG.debug("Unable to connect over CQL", e);

sound like an error to me

DanielwEriksson avatar Apr 26 '24 07:04 DanielwEriksson

the changed logging shall use

https://github.com/Ericsson/ecchronos/blob/master/core/src/main/java/com/ericsson/bss/cassandra/ecchronos/core/utils/logging/ThrottlingLogger.java

DanielwEriksson avatar May 20 '24 08:05 DanielwEriksson

Part of this will be done in issue #666

jwaeab avatar May 22 '24 11:05 jwaeab