databricks-cli
databricks-cli copied to clipboard
Add support for Events in Cluster Service
The methods available for Clusters in the REST API list here: https://docs.azuredatabricks.net/api/latest/clusters.html include the "events" one https://docs.azuredatabricks.net/api/latest/clusters.html#events But in databricks-cli doesn't have this command implemented It would be nice to have this feature
I want to implement this feature, so I've been thinking this a bit and I thought of a command interface like the following:
databricks clusters get-events --help
Usage: databricks clusters get-events [OPTIONS]
Retrieves a list of events about the activity of a cluster.
Options:
--cluster-id CLUSTER_ID Can be found in the URL at https://*.cloud.databric
ks.com/#/setting/clusters/$CLUSTER_ID/configuration
. [required]
--start-time INTEGER The start time in epoch milliseconds. If empty,
returns events starting from the beginning of time.
--end-time INTEGER The end time in epoch milliseconds. If empty,
returns events up to the current time.
--order TEXT The order to list events in; either ASC or DESC.
Defaults to DESC.
--event-types TEXT An optional set of event types to filter on. If
empty, all event types are returned.
--offset INTEGER The offset in the result set. Defaults to 0 (no
offset).
--limit INTEGER The maximum number of events to include in a page
of events. If non specified, it will return all the
events.
--debug Debug Mode. Shows full stack trace on error.
--profile TEXT CLI connection profile to use. The default profile
is "DEFAULT".
-h, --help Show this message and exit.
Any thoughts about this? Is this command interface suitable? @andrewmchen
Some nits:
- I think the command name should be
eventsto be consistent with the API naming. --event-typeswould be better as--event-typeand be a https://click.palletsprojects.com/en/5.x/options/#multiple-options I think.- limit should default to the API default of 50 I think.
Is the output going to be simply the API output?
@andrewmchen Thanks for the feedback! Applying your comments on the cli, It has the following format:
databricks clusters events --help
Usage: databricks clusters events [OPTIONS]
Retrieves a list of events about the activity of a cluster.
Options:
--cluster-id CLUSTER_ID Can be found in the URL at https://*.cloud.d
atabricks.com/#/setting/clusters/$CLUSTER_ID
/configuration. [required]
--start-time INTEGER The start time in epoch milliseconds. If
empty, returns events starting from the
beginning of time.
--end-time INTEGER The end time in epoch milliseconds. If
empty, returns events up to the current
time.
--order ORDER It can be ASC or DESC.
--event-type CLUSTER_EVENT_TYPE
Can be found in the URL at https://*.cloud.d
atabricks.com/api/latest/clusters.html#clust
ereventsclustereventtype
--offset INTEGER The offset in the result set. Defaults to 0
(no offset).
--limit INTEGER The maximum number of events to include in a
page of events.
--debug Debug Mode. Shows full stack trace on error.
--profile TEXT CLI connection profile to use. The default
profile is "DEFAULT".
-h, --help Show this message and exit.
(note that order and event-type types changed in order to follow the API method definition)
Answering your question, my idea about the output was the following:
-
As the
eventsAPI method is paginated, there is no way to list all the events in one API call. Although, maybe It would be useful to get all cluster events in databricks-cli without using pagination outside the cli. -
Also, I thought that the "next_page" and "total_count" attributes that appear on the API method can be removed from the output. But it will still being a JSON Format output.
So, for example, having the following execution:
databricks clusters events --cluster-id "1202-211320-brick1"
The output will be the following:
{
"events": [{
"cluster_id": "1202-211320-brick1",
"timestamp": 1534371918659,
"type": "TERMINATING",
"details": {
"reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
}
}
}
}, {
"cluster_id": "1202-211320-brick1",
"timestamp": 1534358289590,
"type": "RUNNING",
"details": {
"current_num_workers": 2,
"target_num_workers": 2
}
}, {
"cluster_id": "1202-211320-brick1",
"timestamp": 1533225298406,
"type": "RESTARTING",
"details": {
"user": "admin"
}
}]
}
@teresafds What is the motivation/use case for returning all cluster events? I feel that the 50 most recent events is suitable for most users. I think having the CLI be transparent and returning next_page and total_count would be simplest. I think allowing offset as a CLI param allows you to efficiently paginate using the CLI if needed.
@evanye
Thanks for your questions and feedback!
The use case I thought was the following: I have a cluster and I want to retrieve all its historic events (and, if for example, if a cluster has the autoterminating option enabled, there can be many events of turning on and turning off the cluster). Although, this use case maybe can be too specific to be included as a general rule on the cli.
I agree with you in that having the CLI be transparent, respect to the API REST method result is a good idea (it's more transparent and reflects the same behaviour as the rest of the commands in the cli )
So, I rewrite the example for the command events as the following:
Having the following execution:
databricks clusters events --cluster-id "1202-211320-brick1"
The output will be the following:
{
"events": [{
"cluster_id": "1202-211320-brick1",
"timestamp": 1534371918659,
"type": "TERMINATING",
"details": {
"reason": {
"code": "INACTIVITY",
"parameters": {
"inactivity_duration_min": "120"
}
}
}
}, {
"cluster_id": "1202-211320-brick1",
"timestamp": 1534358289590,
"type": "RUNNING",
"details": {
"current_num_workers": 2,
"target_num_workers": 2
}
}, {
"cluster_id": "1202-211320-brick1",
"timestamp": 1533225298406,
"type": "RESTARTING",
"details": {
"user": "admin"
}
}],
"next_page": {
"cluster_id": "0802-034608-aloe926",
"end_time": 1534371918659,
"offset": 50
},
"total_count": 55
}
What do you think? Is the example more suitable? Also, I was thinking another thing: Is it ok to pass so many parameters on the command cli or would it be better to pass them as a json string (like it is done in the create-cluster method, for example)?