openverse
openverse copied to clipboard
Set up Airflow Variable defaults with descriptions automatically
Description
We have a number of Airflow Variables which act as levers for some of our DAGs, allowing us to tweak settings for certain DAGs without having to restart the stack. At present, the documentation for them is spread out and not easily accessible:
For this (and any other Variables that it might seem useful for), we should set up a JSON file with a default value and an adequate description for both what the shape of the expected values are and how this affects the related DAGs. This can effectively serve as documentation for the Variables above.
We can then use airflow variables import (with the --action-on-existing-key set to skip) as part of our entrypoint.sh to ensure it runs both locally and in production on every startup. In production, if the keys already exists, we'll just need to add the descriptions to them once since we won't be overwriting the values.
In production, if the keys already exists, we'll just need to add the descriptions to them once since we won't be overwriting the values.
Hi @AetherUnbound, this looks like something I would like to take on.
Regarding the statement above, is this something that needs to be done in entrypoint.sh also? If yes, does airflow variables import provide a mechanism to read or access the skipped existing keys?
Secondly, where exactly would the addition of the description be done, In the in-memory loaded JSON or in the source JSON file?
Hey @madewithkode! I think that step would necessarily need to be completed by the maintainers. Once we have the Variables documented, we can go in one by one for a one-off update of all the existing Variables to add descriptions. So please do feel free to take on the code work for this PR, and we'll handle the description adding in production once it's ready!
Noted. Would open a draft PR and let you know once I come up with something.
Hi @AetherUnbound does the following JSON file snippet mirror the goal of this task? If yes, where can I find the suitable defaults? and if no, can you throw more light on what's required?
{
"SILENCED_SLACK_NOTIFICATIONS": {
// A link to a GitHub issue which describes why the notification
//is silenced and tracks resolving the problem.
"issue": "",
// Slack notifications whose text or username contain the
//predicate will be silenced. Matching is case-insensitive.
"predicate": "",
// A regex pattern that matches the task_id of the task
// that triggered the notification. (Optional)
"task_id_pattern": ""
},
}
Alright, so I tried to do some digging on what the JSON file actually looked like, and wasn't able to find any in the documentation linked above. Instead, I added a variable locally then ran the following:
$ airflow@ef2c68b81811:/opt/airflow$ airflow variables export -
{
"SAMPLE_VARIABLE": "{\"sample_key\": true, \"other_key: 1, \"final_key\": [\"a\"]}"
}
1 Variables successfully exported.
So...even though this Variable has a description, there's no way to export/import that description using JSON files and airflow variables import 😭
However, it does look like we can use airflow variable set to apply both:
$ airflow@ef2c68b81811:/opt/airflow$ airflow variables set --help
Usage: airflow variables set [-h] [--description DESCRIPTION] [-j] [-v] key VALUE
Set variable
Positional Arguments:
key Variable key
VALUE Variable value
Options:
-h, --help show this help message and exit
--description DESCRIPTION
Variable description, optional when setting a variable
-j, --json Serialize JSON variable
-v, --verbose Make logging output more verbose
The only trouble here is that there aren't any options for how to handle existing Variables, which leads me to believe it'll automatically overwrite if this command is run. We can determine that first though with airflow variables list:
$ airflow@ef2c68b81811:/opt/airflow$ airflow variables list -o plain
key
SAMPLE_VARIABLE
OTHER
So here's what I think are the next steps:
- We create a
variables.csvfile, potentially right undercatalog/, that has the following columns:key,value,description. - We fill out the values for each of those Variables described above (I'll follow up with a second comment on what the default values and description for each might actually look like)
- We add some logic to the
entrypoint.shwhich first runsairflow variables list -o plainto get the list of existing Variables, then iterate through each row ofvariables.csvand only runairflow variables set --description <description> <key> <value>if the key doesn't already exist in the database. That should prevent it from overriding existing values.
This is all admittedly way more complicated than initially expected @madewithkode 😅 And before we go with that approach, I want to make sure it makes sense with @stacimc!
Wow -- interesting find. Disappointing we can't import them so easily, but the next steps @AetherUnbound listed do make sense to me, and I think this feature is very much worth it!
Hi @AetherUnbound nice find and recommendations, I think this is still well within my capacity 😅, would sure be an interesting problem to tackle. I'd await your comment on what the default values and description for each of the variables would look like.
@madewithkode I've created the TSV (it's saved as a CSV here because GitHub doesn't accept TSV uploads directly). It is using a tab for delimiting the columns. Note that the description field has double quotes (") in it - the description field may need to be wrapped in single quotes when it's handed to the command line arguments for airflow variable set. Let me know if you have any other questions!
variables.csv
Just following up to note that I've added the descriptions to production as well 🚀