dagster icon indicating copy to clipboard operation
dagster copied to clipboard

dagster dev env var

Open alangenfeld opened this issue 1 year ago • 7 comments

Set an env var in dagster dev which can be used to control behavior which should only run during local development. Similar to how one might use DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT / DAGSTER_CLOUD_DEPLOYMENT_NAME.

How I Tested These Changes

updated e2e test to include a check

alangenfeld avatar Feb 12 '24 17:02 alangenfeld

  • #19925 Graphite
  • #19923 Graphite
  • #19748 Graphite 👈
  • master

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @alangenfeld and the rest of your teammates on Graphite Graphite

alangenfeld avatar Feb 12 '24 17:02 alangenfeld

Wonder if we should standardize on IS_XXX:

  • DAGSTER_IS_LOCAL_DEVELOPMENT
  • DAGSTER_IS_FULL_DEPLOYMENT
  • DAGSTER_IS_BRANCH_DEPLOYMENT

Having this consistency would be nice for the user, if they want to implement functionality around these environment variables.

rexledesma avatar Feb 12 '24 18:02 rexledesma

My thinking was that this is a pretty good though imperfect indicator of being in "local development". There are cases where someone could be doing local dev where dagster dev did not start the code servers. I do think its pretty unlikely that someone would be powering a "real" deployment using dagster dev.

I have some hesitancy with a name like DAGSTER_IS_LOCAL_DEVELOPMENT given those corner cases where it won't be set despite developing locally.

The direction I am playing around with is how to move stuff like what we currently recommend for DBT https://docs.dagster.io/integrations/dbt/using-dbt-with-dagster/load-dbt-models#step-4-understand-the-python-code-in-your-dagster-project in to a more declarative API where we can pull this env var poking under our implementation. With something like its more realistic to cross reference a few different env vars to try to triangulate what state we believe to be in.

alangenfeld avatar Feb 12 '24 19:02 alangenfeld

How would one use this exactly?

ion-elgreco avatar Feb 12 '24 22:02 ion-elgreco

i've definitely heard of at least one user deploying dagster in 'prod' by running "dagster dev" in an EC2 box or something, but I agree that is not something to be encouraged or to make significant design decisions around

gibsondan avatar Feb 12 '24 22:02 gibsondan

We have this snippet we use:

class DagsterDeploymentType(Flag):
    PROD = auto()
    BRANCH = auto()
    LOCAL = auto()

    CLOUD = PROD | BRANCH
    DEV = BRANCH | LOCAL


def get_current_env() -> DagsterDeploymentType:
    if "DAGSTER_CLOUD_DEPLOYMENT_NAME" not in os.environ:
        return DagsterDeploymentType.LOCAL
    if os.getenv("DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT") == "1":
        return DagsterDeploymentType.BRANCH
    return DagsterDeploymentType.PROD

If you are planning on switching to the consistent DAGSTER_IS_XXX format, it might be worth first adding a utility like this which works with both formats given it's fairly hard to flag deprecation warnings at runtime for environment variable usage. Also makes it easier to add modes somewhere down the line.

Plus I imagine every other repo has something to this effect, so there's something to be said for centralising the boilerplate.

mjclarke94 avatar Feb 13 '24 00:02 mjclarke94

@gibsondan A simple use case is when we add jobs that should only be present in local/branch deployments for creating/dropping zero copy clones of our prod database:

if get_current_env() in DeploymentType.DEV:
    branch_jobs = [g.to_job(resource_defs=RESOURCES) for g in (BRANCH_GRAPHS)]
    jobs.extend(branch_jobs)

We also use it for flow control when certain config values change depending on the environment you're in:

match current_env:
    case DeploymentType.BRANCH:
        branch_name = os.getenv("DAGSTER_CLOUD_GIT_BRANCH")
        branch_id = os.getenv("DAGSTER_CLOUD_PULL_REQUEST_ID")
        SNOWFLAKE_BRANCH = f"{SNOWFLAKE_PROD}_BRANCH_{branch_name}_{branch_id}".upper().replace("-", "_")

    case DeploymentType.LOCAL:
        branch_name = get_git_branch()
        SNOWFLAKE_BRANCH = f"{SNOWFLAKE_PROD}_{branch_name}_{getpass.getuser()}_LOCAL".upper().replace("-", "_")

    case DeploymentType.PROD:
        SNOWFLAKE_BRANCH = SNOWFLAKE_PROD

    case _:
        raise ValueError("Unknown deployment type encountered.")

On second glance, I'd be inclined to make get_current_env() a class method of the DeploymentType enum so you could then do something like:

from dagster.utils import DeploymentType

if DeploymentType.current() in DeploymentType.CLOUD: print("I'm running in the cloud!")

mjclarke94 avatar Feb 13 '24 22:02 mjclarke94

Current iteration is DAGSTER_IS_DEV_CLI to get it more inline with DAGSTER_CLOUD_IS_BRANCH_DEPLOYMENT but remain precise about when this is set. Open to be convinced for another name.

Example use can be seen in upstack (currently draft) PRs

alangenfeld avatar Feb 20 '24 22:02 alangenfeld

Definitely DAGSTER_IS_DEV_CLI rather than DAGSTER_IS_LOCAL_DEVELOPMENT as it describes what is happening precisely rather than trying to divine what is in the user's mind. Beyond the edge case of someone using dagster dev in a deployed ec2 bos, people might run dagster dev locally to manually run jobs and think of that as production. Also there are many cases where you can doing "local development" (e.g. running a unit test) but not using dagster dev.

schrockn avatar Feb 21 '24 13:02 schrockn

Deploy preview for dagit-core-storybook ready!

✅ Preview https://dagit-core-storybook-3jd53uwo3-elementl.vercel.app https://al-02-12-dagster-dev-env-var.core-storybook.dagster-docs.io

Built with commit 323c756acefa2bfea25e810dc9cf0ebb074a92c3. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Feb 21 '24 17:02 github-actions[bot]

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-khmvuotor-elementl.vercel.app https://al-02-12-dagster-dev-env-var.dagster.dagster-docs.io

Direct link to changed pages:

  • https://dagster-docs-khmvuotor-elementl.vercel.app https://al-02-12-dagster-dev-env-var.dagster.dagster-docs.io/tutorial/saving-your-data

github-actions[bot] avatar Feb 21 '24 17:02 github-actions[bot]