astronomer-providers
astronomer-providers copied to clipboard
Implement usage metrics of Operators/Sensors
Add module path and inheritance(sub-class) details to scheduler_job.py
log info, so that astro cloud can pick up the splunk logs to show the usage count of the Operator and sensor
https://www.notion.so/astronomerio/Approach-to-find-usage-metrics-of-Operators-Sensors-947dd9d9968444fe984ee34ef6c4a420
Created PR in OSS Airflow for adding op_classpath
in schedular log, in order to track where the operator/sensor originated from
Slack conversation on the adding op_classpath
in scheduler_job.py
where the logs are generated, then Astro cloud could pick up the info. PR raised in OSS
https://astronomer.slack.com/archives/C02PABPU6B0/p1658846250206209
As per discussion over slack, @steveyz-astro and @ashb confirmed all the worker log get ingested into Splunk. Need to look into metadata about the task instance log, if the task instance log has the class path we collate to get metrics, if doesn't have class path need to add it in worker log
Need to connect with @ashb regarding the log, instead making changes to OSS Airflow, need to try out to get the log from Runtime
Connected with @ashb and @chris decided to follow the steps to get the usage metric without making any changes in OSS
- Ask OL team if we can do an ad-hoc query against the DB
- Explore timeline for getting OL data into DW
- Change operator field in Airflow to be full classpath, not just class name (and work out impact on OSS code)
- Add new log line to Runtime worker output (note worker logs not task logs!)
Connected with @julienledem regarding adding ad-hoc query against the DB, he mentioned we can able to do query for openLineage but it's going to be per org, but this has only class name and doesn't have the class path(it can be added if needed), he also pointed me to the existing operator usage dashboard from @shillion ’s team: https://app.sigmacomputing.com/astronomer/workbook/Astro-Customer-Usage-Dashboard-2LB0JYkylKlxgJtXlRCdpU?:nodeId=3_lTDNxmJY
Got the access to for the DWH
and DWH_DEV
DB from @chris
As per discussion in this thread, currently they don't have openlineage data in Astro GCS or snowflake DB they have RDS instance that has all the individual org database instances, they have plan to get the data into Astro GCS from there to snowflake. I am following up on that.
once we get the data into DWH snowflake we can able to pick up the classpath from the openlineage
@bharanidharan14 to list discussions and thoughts on https://www.notion.so/astronomerio/Approach-to-find-usage-metrics-of-Operators-Sensors-947dd9d9968444fe984ee34ef6c4a420 or decide if we need a new notion page.
Connected with @Jed Cunningham over the slack. He suggested to use the new listener in our case https://airflow.apache.org/docs/apache-airflow/stable/listeners.html . So we add up a listener in astro runtime here (https://github.com/astronomer/astro-runtime/blob/main/package/astronomer/runtime/plugin.py ) to log the class path, so that the logs get into the splunk as well.
Need to try what @jedcunningham suggested
I can able to get the class path by adding plugin and listener in local with task instance details
Made changes to astro runtime repo, currently I don't have write access to it and posted in astro-runtime channel
Raised PR for adding listener to the astro-runtime repo, this Listener will get the task instance details and log those details, so that the data team can capture this log into table.
PR: https://github.com/astronomer/astro-runtime/pull/349
Requested @jedcunningham to review the PR and raised the same concern in astro-runtime
channel
PR got merged into astro-runtime
repo, currently working on test case and testing it in dev.
Asked @astronaut-chris on the timeline of getting this log into Splunk, these was his response and his planned item he has to work on.
- A new runtime would have to be released, where your TaskInstance Details log messages would start being emitted
- A Splunk query would have to be written/agreed upon, that could get the distinct TaskInstance Details log messages over a given period
- We would add an ingestion task to our Splunk DAG
- We would join these details into our task_runs table.
@bharanidharan14 Whats the latest on this ticket? Can you ensure that this is updated as soon as we progress on the task.
Connected with @astronaut-chris regarding the task once the latest astro-runtime image is release, he should be able to pick the log and star on the tickets DT-587
Data team ticket : https://astronomer.atlassian.net/browse/DT-587
Waiting on the changes by Data team.
Hello! Even though it appears that there are many deployments that have adopted the 6.x runtime, it looks like we're only picking up events in dev. Please see below.
For context, we pick up TaskInstance Finished
logs from the prod-schedulers
index.
Can you dig into why this is not exporting to the correct index?
Connected with @astronaut-chris, we tried to get some information out @rossturk POC lineage data in the sandbox as an early analysis, we can able to get the whole classpath, maximum used operator and their the parent class, operator name by connecting to the table.
I have updated the query and graph we got by running query on POC lineage data https://www.notion.so/astronomerio/Approach-to-find-usage-metrics-of-Operators-Sensors-947dd9d9968444fe984ee34ef6c4a420
https://github.com/astronomer/analyses/blob/main/analyses/needed-async-operators/needed-async-operators.R
Connected with @uranusjr regarding how to achieve the first-level inheritance of the custom operator, so based on the discussion with him he suggested some of the steps.
- Try to add a field in the task instance before the operator is serialized which gets the inheritance chain module as a string and captured into the task instance table.
- Try this as POC and make sure whether we get this info in the TaskInstance table and this data will be reflected in the Splunk
Created PR: https://github.com/astronomer/astro-runtime/pull/524