scylla-manager
scylla-manager copied to clipboard
Add a job in SM to collect Scylla Doctor from whole cluster
SM and its agents can be leveraged to collect from central place the state of all nodes using Scylla Doctor ( https://github.com/scylladb/scylla-doctor )
this will help support to properly and quickly help customers to verify their clusters and their health and ev. config drifts
Can a job be added to run SD on all nodes and collect its outputs?
@karol-kokoszka can you triage? we can certainly add knowledge how to run SD or internal ways how this is gathered for Scylla Cloud or others
@tarzanek How to call Scylla-Doctor ? Is it CLI that needs to be executed on the hosts or it has some API ? Or maybe it can be called from any server (let's say manager server VM) ?
Scylla Manager is not SSHing, it's calling Agent's API, that why I'm asking about the way to execute the job.
You want to merge it with the Scylla Manager task scheduler ?
it's cli command that needs to be executed on target hosts ( https://github.com/scylladb/scylla-doctor/tree/master/scylla-doctor#usage ) as root
it results will be in a vitals file that will need a download to SM
it's cli command that needs to be executed on target hosts
It means that we would need to call agent to execute the CLI and collect the output.
it results will be in a vitals file that will need a download to SM
Assuming the doctor is executed through API call to agent, it's not a problem, as it will be just in the payload.
@tarzanek could the scylla doctor be imported to the agent codebase somehow ? Through the golang dependency for example ? UPDATE: It's python, it couldn't.
How do you see scheduling this job ? Part of the task scheduler in manager (the same as we use for repair, backup) ? Is it needed to be scheduled ? Or it's rather "ad-hoc" job ?
it's cli command that needs to be executed on target hosts ( https://github.com/scylladb/scylla-doctor/tree/master/scylla-doctor#usage ) as root
I'm concerned about the as root part. In general sm-agent shouldn't have root access to the machine. It's kind of suspicious for sm-agent to be able to run commands arbitrary with sudo.
@karol-kokoszka do you know what permissions does sm-agent have in the cloud?
cc: @adambabik
We should not proceed with this, until we have a much better understanding what problem we want to solve and what alternatives we have. We don't have any agreement, in terms of architecture, that Scylla Manager should be a hub that integrates various tools.
I see that we have Scylla Doctor as a part of various DevOps workflows in ArgoWF. We also have it run periodically. Why is that not enough?
on prem customers basically lack the workflows so SM with its task engine looks like good place to schedule such tasks as log collection
Ah I see, so it's only about the on prem customers. Then this should be discussed during the Manager planning. @karol-kokoszka @Michal-Leszczynski let's add it to the agenda for the Manager planning meeting.