dagster
dagster copied to clipboard
[pipes] JsonSchema for externals protocol
Summary & Motivation
Add a script that generates a JSON schema for the externals protocol.
The script uses pydantic and lives in top-level scripts. It writes the json schema to python_modules/dagster-ext/json_schema/{context,message}.json. The script requires pydantic v2 so it must be run through tox -e jsonschema (from dagster-externals) until core is updated.
I wasn't sure how to represent a combined schema for context and message, so I put them in separate schema files.
Also adds a BK step that generates the schema and diffs it against the checked-in version, ensuring nothing has changed.
The schema files are also included in the built dagster-pipes package.
How I Tested These Changes
New unit tests to ensure JSON schema is valid and that context/message objects satisfy it.
- #16009
👈 (View in Graphite) - #16633

master
This stack of pull requests is managed by Graphite. Learn more about stacking.
maybe we should wait until the pydantic 2 move so we never have to check in this workaround/complexity?
fine w me. That said, the workaround is only a single line, (pydantic>2 in tox.ini), I believe everything else about this PR will stay the same after core supports pydantic 2.
Deploy preview for dagster-docs ready!
Preview available at https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io
Direct link to changed pages:
- https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/deployment/agents/running-multiple-agents
- https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/insights
- https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/insights/integrating-external-metrics
- https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/dagster-cloud/insights/integrating-snowflake-and-dbt
- https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/deployment/guides/kubernetes/customizing-your-deployment
- https://dagster-docs-c2zcpm47m-elementl.vercel.app https://sean-json-schema.dagster.dagster-docs.io/integrations/dbt
Deploy preview for dagit-core-storybook ready!
✅ Preview https://dagit-core-storybook-5p3jz8z09-elementl.vercel.app https://sean-json-schema.core-storybook.dagster-docs.io
Built with commit a7fc19de20d0c48bcb467d61dca9924947ff49ac. This pull request is being automatically deployed with vercel-action
Deploy preview for dagit-storybook ready!
✅ Preview https://dagit-storybook-i9p0g7f56-elementl.vercel.app https://sean-json-schema.components-storybook.dagster-docs.io
Built with commit d04b08e729031dc5f473b73c978516cd248c1520. This pull request is being automatically deployed with vercel-action
if I'm writing ext in Scala and I want to use the json schema, what does that look like?
Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation.
Alternatively we could publish the schema to a public URL and just expose that.
Deploy preview for dagster-university ready!
✅ Preview https://dagster-university-41f7eq2x0-elementl.vercel.app https://sean-json-schema.dagster-university.dagster-docs.io
Built with commit 824b830651ed4c6a536561de2c496f85232c6608. This pull request is being automatically deployed with vercel-action
Depends on what our Scala ext story integration story looks like. If we have a dedicated lib then we would include this schema in that lib. If we don't then we could provide a CLI method to access it. Either way it falls to whatever JSON schema libs are available in Scala to actually perform validation.
Right. I don't think we need to commit to this schema right now, and I think it make sense to do this when we actually build our first non-Python integration. So my proposal is that we resurrect this diff when we write our first prototype in another language.