dagster
dagster copied to clipboard
Allow any valid Pydantic field type to be used in Config
What's the use case?
Currently Dagster only allows primitive, collection, Config
or discriminated union types in Pydantic configs. However, Pydantic provides a bunch of field types aside from these, as well as the ability to define a custom field type. Ideally anything that is valid in Pydantic should be valid for dagster configs.
Alternatively, there should be a dagster specific way to create custom field types from the currently allowed field types. In dagster 1.3.6, this is like half-implemented if you subclass Config
and yield a validator in __get_validators__
that accepts both Mapping and non-Mapping input (say a string as an example) you can pass the value of the field in your preferred format directly to the Config
/ ConfiguredResource
class. However, the subclass still needs to accept a Mapping input and if specifying the config via run config, it needs to be in the mapping format.
Ideas of implementation
I'm not sure what internals (if any) make the primitive, collection, Config
or discriminated union types restriction necessary, but ideally Dagster should just offload the validation of config fields off to Pydantic. Anything that Pydantic accepts Dagster should also accept.
Additional information
No response
Message from the maintainers
Impacted by this issue? Give it a 👍! We factor engagement into prioritization.
@benpankow I hacked your type system 😅
By creating a custom metaclass that inherits from ConfigScalar
for my custom datatype and overriding _get_non_none_public_field_values
in my resource, I was able to get my custom datatype to serialize and validate as a string but it appears as my custom type in the resource object itself.
So the simplest implementation of this might be to just have an abc to setup a custom type that just needs to know how to serialize the class in a config dict and we can rely on Pydantic to validate that the input is correct.
We ran in to this recently. @pzarabadip, am I remembering correctly that datetime.timedelta
instances weren't working? (Sorry for the ping, not at my laptop so can't check!)
Hey @ndellosa95. As @mjclarke94 mentioned, we've got similar issue with timedelta
. Following code snipped results in validation failure:
from dagster import asset, Config, OpExecutionContext
from datetime import timedelta
class MyConfig(Config):
input: timedelta
@asset
def test_timedelta(context: OpExecutionContext, config: MyConfig):
return context.log.info("test")
with following (trimmed) log:
Unable to resolve config type <class 'datetime.timedelta'> to a supported Dagster config type.\n\n\nThis config type can be a:\n - Python primitive type\n - int, float, bool, str, list\n - A Python Dict or List type containing other valid types\n - Custom data classes extending dagster.Config\n - A Pydantic discriminated union type (https://docs.pydantic.dev/usage/types/#discriminated-unions-aka-tagged-unions)\n"
Hey @pzarabadip, @mjclarke94!
So I'm not sure whether Dagster intends to support custom types, or types without 1:1 JSON serialization, or not. But I'll use your example to showcase my workaround. The hack I came up with gets around this by 1) tricking the type system into thinking your type is a Dagster-config type and 2) providing custom JSON serialization logic for fields with your type. So if you wanted to serialize your timedeltas as an integer of total seconds, here's what the code would look like (I did not test this, but it's fairly cut and paste from what I'm using myself):
from typing import Any, Mapping
from dagster import asset, Config, OpExecutionContext
from dagster._config.config_type import ConfigScalar, ConfigScalarKind
from datetime import timedelta
class _DagsterConfigTimedeltaMeta(type, ConfigScalar):
def __init__(cls, name, bases, dct):
type.__init__(cls, name, bases, dct)
ConfigScalar.__init__(cls, key="timedelta", given_name="Timedelta", scalar_kind=ConfigScalarKind.INT)
class DagsterConfigTimedelta(timedelta, metaclass=_DagsterConfigTimedeltaMeta):
@classmethod
def _make_timedelta_from_seconds(cls, value): # Self type return
if isinstance(value, cls):
return value
if isinstance(value, int):
return cls(seconds=value)
raise ValueError # your error message here
@classmethod
def __get_validators__(cls):
yield cls._make_timedelta_from_seconds
class MyConfig(Config):
input: DagsterConfigTimedelta
def _get_non_none_public_field_values(self) -> Mapping[str, Any]:
"""Returns a dictionary representation of this config object,
ignoring any private fields, and any optional fields that are None.
Inner fields are returned as-is in the dictionary,
meaning any nested config objects will be returned as config objects, not dictionaries.
"""
def map_timedelta_field(field_key: str, value):
field = self.__fields__.get(field_key)
if field and issubclass(field.type_, DagsterConfigTimedelta):
return value.total_seconds()
return value
return {k: map_timedelta_field(k, v) for k, v in super()._get_non_none_public_field_values().items()}
@asset
def test_timedelta(context: OpExecutionContext, config: MyConfig):
return context.log.info("test")
This workaround is very neat to see! Right now, the chief constraint to the Pydantic config layer is that it's translated to the older Dagster config system under the hood - as you're doing by extending ConfigScalar
in your example.
I think it's not an unreasonable ask to introduce a more native way to tell Dagster which Dagster config type can be used to represent the field type, and a way to convert to and from that field type (e.g. providing timedelta <-> int conversion methods). This would mirror what the above example does but in a more compact way. I'll try to sketch out some examples for such an API.
@ndellosa95 Thanks very much for sharing this!
@benpankow The issue in #14520 is potentially relevant here too as I imagine the solution for this would probably share a lot of surface area with that?
@benpankow is there a reason for why the older system is still running under the hood? limiting to dagster config types feels really arbitrary (no such limitations in fastapi or e.g. prefect), and makes it really painful to integrate with our current pydantic setup
This workaround is very neat to see! Right now, the chief constraint to the Pydantic config layer is that it's translated to the older Dagster config system under the hood - as you're doing by extending
ConfigScalar
in your example.I think it's not an unreasonable ask to introduce a more native way to tell Dagster which Dagster config type can be used to represent the field type, and a way to convert to and from that field type (e.g. providing timedelta <-> int conversion methods). This would mirror what the above example does but in a more compact way. I'll try to sketch out some examples for such an API.
@benpankow Did you ever do this? Do you all have plans to rip out the old type system or is that more of a 2.x feature? If not, I have a generic version of the above workaround I can contribute back to dagster's codebase.