meltano
meltano copied to clipboard
feat: add support for `env` key for `jobs`
See discussion in slack: https://meltano.slack.com/archives/C01TCRBBJD7/p1657233220239139
Consider example:
jobs:
- name: gitlab
tasks:
- dbt-postgres:seed tap-gitlab target-postgres dbt-postgres:snapshot dbt-postgres:run dbt-postgres:test
meltano run gitlab
will run all the dbt commands for the whole project rather than allowing me to specify only the specific models that have changed (e.g. DBT_MODELS=+gitlab+
).
I realize this can be solved by schedules:
schedules:
- name: gitlab
interval: '@daily'
job: gitlab
env:
DBT_MODELS: +gitlab+
However it seems to me that this information belongs to the job itself, not the schedule; that is even if I were to manually run meltano run gitlab
outside of the @daily
schedule, I’d still expect it to apply my configuration.
Proposal: Allow env
s to be set on a job basis:
jobs:
- name: gitlab
tasks:
- dbt-postgres:seed tap-gitlab target-postgres dbt-postgres:snapshot dbt-postgres:run dbt-postgres:test
env:
DBT_MODELS: +gitlab+
TARGET_POSTGRES_BATCH_SIZE_ROWS: 100 # could also be useful to override plugin configs
Precedence can get tricky here, but I would expect:
meltano environment < job < schedule < .env < system envrionment
https://docs.meltano.com/guide/configuration#specifying-environment-variables is the current order
- environment level plugin env # highest
- environment level env
- root level plugin env
- root level env
- .env file
- terminal env # lowest
Assuming both jobs and schedules could specify env
I would expect the following:
- environment level plugin env # highest
- environment level env
- schedule level env
- job level env
- root level plugin env
- root level env
- .env file
- terminal env # lowest
my thinking is that schedules could run multiple instances of a job and one would be more inclined to override a setting for a given job on a per schedule basis rather than the other way around. Does that track?
All that said I'm in favor of adding env
for both jobs and schedules (I'll open a separate issue for schedules
) - the last precedence question will take a bit more debate. cc @aaronsteers
By a < b
I meant b has higher precedence than a but I can see how that was ambiguous. Yes, I agree that schedules should have higher precedence than jobs.
I'm a little surprised that system env vars are overwritten by environments (for example, I might do FOO=override meltano run
), but I don't think it's a problem and I'm sure y'all have considered it more than I have, and anyway that's a separate topic
I'm a little surprised that system env vars are overwritten by environments (for example, I might do FOO=override meltano run), but I don't think it's a problem and I'm sure y'all have considered it more than I have, and anyway that's a separate topic
@rabidaudio feel free to open a new issue or slack thread if you want to discuss - but what I'll quickly say is that you can still get this behavior I think if you had
environments:
prod:
env:
FOO: $MY_OVERRIDE
and then set MY_OVERRIDE
or not in the system env. We did try to put a lot of thought into it and it may be worth adding some of that context in https://docs.meltano.com/guide/configuration#specifying-environment-variables 🤔
@tayloramurphy @rabidaudio I landed on this while working on https://github.com/meltano/squared/issues/289 where I was trying to get away from exporting env vars before running a command like in this case https://github.com/meltano/squared/blob/64cffc745eba6721fb97490d5aa14f6f7a970afc/data/orchestrate/dag_definition.yml#L146 when using schedules/jobs. I was a able to solve it without schedule/job env vars but was about to request the same feature before I found this issue so I was giving it some thought also:
@rabidaudio I know this is just a solution for this particular problem but what do you think about putting your dbt model selection criteria in a custom command (Squared example)? I've been exploring this pattern so that every set of dbt selection criteria is checked in as code. I know its bit me in the past when selection criteria gets really precise and I either include or exclude something I shouldnt. Did you consider this for youre use case already? Is there any cons that I might be missing other than verboseness?
@cjohnhanson - Assigning to you for scoping/analysis as related to #6387. Please confirm the spec - especially resolution order - before starting dev.
@tayloramurphy -- there's work in progress (see #5982) to update the precedence order to be in line with this spec in Miro. New precedence order would be:
- environment level plugin env # highest
- root level plugin env
- environment level env
- root level env
- .env file
- terminal env # lowest
Supporting env
in schedules and jobs will happen after that work is completed. So in the new precedence order, where would job and schedule envs fit? Would it be:
A)
- environment level plugin env # highest
- root level plugin env
- schedule level env
- job level env
- environment level env # job and schedule take precedence over active environment
- root level env
- .env file
- terminal env # lowest
or B)
- environment level plugin env # highest
- root level plugin env
- environment level env # active environment takes precedence over job and schedule
- schedule level env
- job level env
- root level env
- .env file
- terminal env # lowest
Or something else?
@cjohnhanson I think this would be the best order:
- environment level plugin env # highest
- schedule level env
- job level env
- root level plugin env
- environment level env
- root level env
- .env file
- terminal env # lowest
My logic here is similar to what I said in https://github.com/meltano/meltano/issues/6386#issuecomment-1179138398 But basically, a job is a collection of plugins run at the same time and users would likely want to override plugin settings at the job level. Similarly, a schedule is a particular instance of a job and there can be multiple schedules per job, so overrides from there make sense above jobs. Then having the environment level plugin env as the highest priority is the last place you could override anything set lower in the hierarchy.
@aaronsteers would you agree on this?
@aaronsteers @cjohnhanson I'm pulling this out of the next iteration. While I think this would be a useful thing to have, I don't think we need to currently build it ourselves. Thinking through the spec discussion has been useful and clarifying in relation to #6387 but adding support for env
at the jobs level isn't critical right now.
I'll mark it as accepting PR's though in case the community is interested in having this.
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen
label, or request that it be added.
More users interested in this in Slack: https://meltano.slack.com/archives/C01TCRBBJD7/p1690894749896989?thread_ts=1690894749.896989&cid=C01TCRBBJD7