dagster icon indicating copy to clipboard operation
dagster copied to clipboard

[pipes] JsonSchema for externals protocol

Open smackesey opened this issue 2 years ago • 8 comments

Summary & Motivation

Add a script that generates a JSON schema for the externals protocol.

The script uses pydantic and lives in top-level scripts. It writes the json schema to python_modules/dagster-ext/json_schema/{context,message}.json. The script requires pydantic v2 so it must be run through tox -e jsonschema (from dagster-externals) until core is updated.

I wasn't sure how to represent a combined schema for context and message, so I put them in separate schema files.

Also adds a BK step that generates the schema and diffs it against the checked-in version, ensuring nothing has changed.

The schema files are also included in the built dagster-pipes package.

How I Tested These Changes

New unit tests to ensure JSON schema is valid and that context/message objects satisfy it.

smackesey avatar Aug 22 '23 13:08 smackesey

This stack of pull requests is managed by Graphite. Learn more about stacking.

smackesey avatar Aug 22 '23 13:08 smackesey

maybe we should wait until the pydantic 2 move so we never have to check in this workaround/complexity?

fine w me. That said, the workaround is only a single line, (pydantic>2 in tox.ini), I believe everything else about this PR will stay the same after core supports pydantic 2.

smackesey avatar Aug 23 '23 23:08 smackesey

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io

Direct link to changed pages:

  • https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/deployment/agents/running-multiple-agents
  • https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/insights
  • https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/insights/integrating-external-metrics
  • https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/insights/integrating-snowflake-and-dbt
  • https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/deployment/guides/kubernetes/customizing-your-deployment
  • https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/integrations/dbt

github-actions[bot] avatar Aug 24 '23 12:08 github-actions[bot]

Deploy preview for dagit-core-storybook ready!

✅ Preview https://dagit-core-storybook-5p3jz8z09-elementl.vercel.app https://sean-json-schema.core-storybook.dagster-docs.io

Built with commit a7fc19de20d0c48bcb467d61dca9924947ff49ac. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Aug 24 '23 12:08 github-actions[bot]

Deploy preview for dagit-storybook ready!

✅ Preview https://dagit-storybook-i9p0g7f56-elementl.vercel.app https://sean-json-schema.components-storybook.dagster-docs.io

Built with commit d04b08e729031dc5f473b73c978516cd248c1520. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Aug 24 '23 12:08 github-actions[bot]

if I'm writing ext in Scala and I want to use the json schema, what does that look like?

Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation.

Alternatively we could publish the schema to a public URL and just expose that.

smackesey avatar Sep 21 '23 16:09 smackesey

Deploy preview for dagster-university ready!

✅ Preview https://dagster-university-41f7eq2x0-elementl.vercel.app https://sean-json-schema.dagster-university.dagster-docs.io

Built with commit 824b830651ed4c6a536561de2c496f85232c6608. This pull request is being automatically deployed with vercel-action

github-actions[bot] avatar Sep 22 '23 14:09 github-actions[bot]

Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation.

Right. I don't think we need to commit to this schema right now, and I think it make sense to do this when we actually build our first non-Python integration. So my proposal is that we resurrect this diff when we write our first prototype in another language.

schrockn avatar Sep 22 '23 14:09 schrockn