airflow icon indicating copy to clipboard operation
airflow copied to clipboard

changing dag_processing.processes from UpDownCounter to guage

Open Bowrna opened this issue 1 year ago • 5 comments

issue: #36188 related: #36188


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Bowrna avatar Mar 22 '24 08:03 Bowrna

Add deprecation details around the old metric. Can you update the metrics documentation?

yes @dirrao Let me do it

Bowrna avatar Mar 23 '24 11:03 Bowrna

when setting the dag_processing.processes as gauge instead of the counter, we may need to initialize the value.

Gauge works in the following way:
gaugor:333|g
the above command sets the param `gaugor` to 333
gaugor:433|g
the above command sets the param `gaugor` to 433
gaugor:-10|g
the above command sets the param `gaugor` to 423
gaugor:+4|g
the above command sets the param `gaugor` to 427

So to keep increasing the process count as it starts and decrease the process count as it stops(finishes or timeouts or error outs), we first have to initialize it to zero value at the beginning. I could set the gauge to zero at the start of the airflow scheduler command. https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#running-more-than-one-scheduler But there is an option to run more than one scheduler. so setting the param as zero in the beginning of the airflow scheduler command wouldn't be the right way.

How do you think this case can be better handled? @potiuk @dirrao @ferruzzi @hussein-awala

Bowrna avatar Mar 25 '24 05:03 Bowrna

when setting the dag_processing.processes as gauge instead of the counter, we may need to initialize the value.

Gauge works in the following way:
gaugor:333|g
the above command sets the param `gaugor` to 333
gaugor:433|g
the above command sets the param `gaugor` to 433
gaugor:-10|g
the above command sets the param `gaugor` to 423
gaugor:+4|g
the above command sets the param `gaugor` to 427

So to keep increasing the process count as it starts and decrease the process count as it stops(finishes or timeouts or error outs), we first have to initialize it to zero value at the beginning. I could set the gauge to zero at the start of the airflow scheduler command. https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#running-more-than-one-scheduler But there is an option to run more than one scheduler. so setting the param as zero in the beginning of the airflow scheduler command wouldn't be the right way.

How do you think this case can be better handled? @potiuk @dirrao @ferruzzi @hussein-awala

@ferruzzi I think I will need help with the above part. If we are changing dag_processing.processes with a different name to gauge, I have to find out about initializing to zero value. Your views on this would be helpful.

Edit: I will need help in this part to get unblocked

Bowrna avatar Mar 28 '24 05:03 Bowrna

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar May 18 '24 00:05 github-actions[bot]

when setting the dag_processing.processes as gauge instead of the counter, we may need to initialize the value.

Gauge works in the following way:
gaugor:333|g
the above command sets the param `gaugor` to 333
gaugor:433|g
the above command sets the param `gaugor` to 433
gaugor:-10|g
the above command sets the param `gaugor` to 423
gaugor:+4|g
the above command sets the param `gaugor` to 427

So to keep increasing the process count as it starts and decrease the process count as it stops(finishes or timeouts or error outs), we first have to initialize it to zero value at the beginning. I could set the gauge to zero at the start of the airflow scheduler command. https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/scheduler.html#running-more-than-one-scheduler But there is an option to run more than one scheduler. so setting the param as zero in the beginning of the airflow scheduler command wouldn't be the right way. How do you think this case can be better handled? @potiuk @dirrao @ferruzzi @hussein-awala

@ferruzzi I think I will need help with the above part. If we are changing dag_processing.processes with a different name to gauge, I have to find out about initializing to zero value. Your views on this would be helpful.

Edit: I will need help in this part to get unblocked

@ferruzzi do I have any update on this part?

Bowrna avatar May 19 '24 10:05 Bowrna

@ferruzzi do I have any update on this part?

How would that work? A gauge is an arbitrary reading, you just tell it what the new/current value is. Unless you see some way for Airflow to read the last reported value from the metrics backend (StatsD, OTel, etc) you would need to store the last-reported value somewhere, increment/decrement it accordingly, then emit the new value. If you are doping that then it seems like the answer would be to add some logic which fetches thew last reported value and defaults to 0 if it isn't found.

ferruzzi avatar May 21 '24 16:05 ferruzzi

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 09 '24 00:09 github-actions[bot]