dagster
dagster copied to clipboard
dagster dev env var
Set an env var in dagster dev
which can be used to control behavior which should only run during local development. Similar to how one might use DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT
/ DAGSTER_CLOUD_DEPLOYMENT_NAME
.
How I Tested These Changes
updated e2e test to include a check
This stack of pull requests is managed by Graphite. Learn more about stacking.
Join @alangenfeld and the rest of your teammates on Graphite
Wonder if we should standardize on IS_XXX
:
-
DAGSTER_IS_LOCAL_DEVELOPMENT
-
DAGSTER_IS_FULL_DEPLOYMENT
-
DAGSTER_IS_BRANCH_DEPLOYMENT
Having this consistency would be nice for the user, if they want to implement functionality around these environment variables.
My thinking was that this is a pretty good though imperfect indicator of being in "local development". There are cases where someone could be doing local dev where dagster dev
did not start the code servers. I do think its pretty unlikely that someone would be powering a "real" deployment using dagster dev
.
I have some hesitancy with a name like DAGSTER_IS_LOCAL_DEVELOPMENT
given those corner cases where it won't be set despite developing locally.
The direction I am playing around with is how to move stuff like what we currently recommend for DBT https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster/load-dbt-models#step-4-understand-the-python-code-in-your-dagster-project in to a more declarative API where we can pull this env var poking under our implementation. With something like its more realistic to cross reference a few different env vars to try to triangulate what state we believe to be in.
How would one use this exactly?
i've definitely heard of at least one user deploying dagster in 'prod' by running "dagster dev" in an EC2 box or something, but I agree that is not something to be encouraged or to make significant design decisions around
We have this snippet we use:
class DagsterDeploymentType(Flag):
PROD = auto()
BRANCH = auto()
LOCAL = auto()
CLOUD = PROD | BRANCH
DEV = BRANCH | LOCAL
def get_current_env() -> DagsterDeploymentType:
if "DAGSTER_CLOUD_DEPLOYMENT_NAME" not in os.environ:
return DagsterDeploymentType.LOCAL
if os.getenv("DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT") == "1":
return DagsterDeploymentType.BRANCH
return DagsterDeploymentType.PROD
If you are planning on switching to the consistent DAGSTER_IS_XXX
format, it might be worth first adding a utility like this which works with both formats given it's fairly hard to flag deprecation warnings at runtime for environment variable usage. Also makes it easier to add modes somewhere down the line.
Plus I imagine every other repo has something to this effect, so there's something to be said for centralising the boilerplate.
@gibsondan A simple use case is when we add jobs that should only be present in local/branch deployments for creating/dropping zero copy clones of our prod database:
if get_current_env() in DeploymentType.DEV:
branch_jobs = [g.to_job(resource_defs=RESOURCES) for g in (BRANCH_GRAPHS)]
jobs.extend(branch_jobs)
We also use it for flow control when certain config values change depending on the environment you're in:
match current_env:
case DeploymentType.BRANCH:
branch_name = os.getenv("DAGSTER_CLOUD_GIT_BRANCH")
branch_id = os.getenv("DAGSTER_CLOUD_PULL_REQUEST_ID")
SNOWFLAKE_BRANCH = f"{SNOWFLAKE_PROD}_BRANCH_{branch_name}_{branch_id}".upper().replace("-", "_")
case DeploymentType.LOCAL:
branch_name = get_git_branch()
SNOWFLAKE_BRANCH = f"{SNOWFLAKE_PROD}_{branch_name}_{getpass.getuser()}_LOCAL".upper().replace("-", "_")
case DeploymentType.PROD:
SNOWFLAKE_BRANCH = SNOWFLAKE_PROD
case _:
raise ValueError("Unknown deployment type encountered.")
On second glance, I'd be inclined to make get_current_env()
a class method of the DeploymentType enum so you could then do something like:
from dagster.utils import DeploymentType
if DeploymentType.current() in DeploymentType.CLOUD: print("I'm running in the cloud!")
Current iteration is DAGSTER_IS_DEV_CLI
to get it more inline with DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT
but remain precise about when this is set. Open to be convinced for another name.
Example use can be seen in upstack (currently draft) PRs
Definitely DAGSTER_IS_DEV_CLI
rather than DAGSTER_IS_LOCAL_DEVELOPMENT
as it describes what is happening precisely rather than trying to divine what is in the user's mind. Beyond the edge case of someone using dagster dev
in a deployed ec2 bos, people might run dagster dev locally to manually run jobs and think of that as production. Also there are many cases where you can doing "local development" (e.g. running a unit test) but not using dagster dev
.
Deploy preview for dagit-core-storybook ready!
✅ Preview https://dagit-core-storybook-3jd53uwo3-elementl.vercel.app https://al-02-12-dagster-dev-env-var.core-storybook.dagster-docs.io
Built with commit 323c756acefa2bfea25e810dc9cf0ebb074a92c3. This pull request is being automatically deployed with vercel-action
Deploy preview for dagster-docs ready!
Preview available at https://dagster-docs-khmvuotor-elementl.vercel.app https://al-02-12-dagster-dev-env-var.dagster.dagster-docs.io
Direct link to changed pages:
- https://dagster-docs-khmvuotor-elementl.vercel.app https://al-02-12-dagster-dev-env-var.dagster.dagster-docs.io/tutorial/saving-your-data