dagster icon indicating copy to clipboard operation
dagster copied to clipboard

Allow any valid Pydantic field type to be used in Config

Open ndellosa95 opened this issue 1 year ago • 8 comments

What's the use case?

Currently Dagster only allows primitive, collection, Config or discriminated union types in Pydantic configs. However, Pydantic provides a bunch of field types aside from these, as well as the ability to define a custom field type. Ideally anything that is valid in Pydantic should be valid for dagster configs.

Alternatively, there should be a dagster specific way to create custom field types from the currently allowed field types. In dagster 1.3.6, this is like half-implemented if you subclass Config and yield a validator in __get_validators__ that accepts both Mapping and non-Mapping input (say a string as an example) you can pass the value of the field in your preferred format directly to the Config / ConfiguredResource class. However, the subclass still needs to accept a Mapping input and if specifying the config via run config, it needs to be in the mapping format.

Ideas of implementation

I'm not sure what internals (if any) make the primitive, collection, Config or discriminated union types restriction necessary, but ideally Dagster should just offload the validation of config fields off to Pydantic. Anything that Pydantic accepts Dagster should also accept.

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

ndellosa95 avatar Jun 01 '23 15:06 ndellosa95

@benpankow I hacked your type system 😅

By creating a custom metaclass that inherits from ConfigScalar for my custom datatype and overriding _get_non_none_public_field_values in my resource, I was able to get my custom datatype to serialize and validate as a string but it appears as my custom type in the resource object itself.

So the simplest implementation of this might be to just have an abc to setup a custom type that just needs to know how to serialize the class in a config dict and we can rely on Pydantic to validate that the input is correct.

ndellosa95 avatar Jun 02 '23 16:06 ndellosa95

We ran in to this recently. @pzarabadip, am I remembering correctly that datetime.timedelta instances weren't working? (Sorry for the ping, not at my laptop so can't check!)

mjclarke94 avatar Jun 02 '23 18:06 mjclarke94

Hey @ndellosa95. As @mjclarke94 mentioned, we've got similar issue with timedelta. Following code snipped results in validation failure:

from dagster import asset, Config, OpExecutionContext
from datetime import timedelta

class MyConfig(Config):
    input: timedelta


@asset
def test_timedelta(context: OpExecutionContext, config: MyConfig):
    return context.log.info("test")

with following (trimmed) log:

Unable to resolve config type <class 'datetime.timedelta'> to a supported Dagster config type.\n\n\nThis config type can be a:\n    - Python primitive type\n        - int, float, bool, str, list\n    - A Python Dict or List type containing other valid types\n    - Custom data classes extending dagster.Config\n    - A Pydantic discriminated union type (https://docs.pydantic.dev/usage/types/#discriminated-unions-aka-tagged-unions)\n"

ezpzbz avatar Jun 02 '23 18:06 ezpzbz

Hey @pzarabadip, @mjclarke94!

So I'm not sure whether Dagster intends to support custom types, or types without 1:1 JSON serialization, or not. But I'll use your example to showcase my workaround. The hack I came up with gets around this by 1) tricking the type system into thinking your type is a Dagster-config type and 2) providing custom JSON serialization logic for fields with your type. So if you wanted to serialize your timedeltas as an integer of total seconds, here's what the code would look like (I did not test this, but it's fairly cut and paste from what I'm using myself):

from typing import Any, Mapping
from dagster import asset, Config, OpExecutionContext
from dagster._config.config_type import ConfigScalar, ConfigScalarKind
from datetime import timedelta


class _DagsterConfigTimedeltaMeta(type, ConfigScalar):
    def __init__(cls, name, bases, dct):
        type.__init__(cls, name, bases, dct)
        ConfigScalar.__init__(cls, key="timedelta", given_name="Timedelta", scalar_kind=ConfigScalarKind.INT)


class DagsterConfigTimedelta(timedelta, metaclass=_DagsterConfigTimedeltaMeta):

    @classmethod
    def _make_timedelta_from_seconds(cls, value):  # Self type return
        if isinstance(value, cls):
            return value
        if isinstance(value, int):
            return cls(seconds=value)
        raise ValueError  # your error message here
    
    @classmethod
    def __get_validators__(cls):
        yield cls._make_timedelta_from_seconds


class MyConfig(Config):
    input: DagsterConfigTimedelta

    def _get_non_none_public_field_values(self) -> Mapping[str, Any]:
        """Returns a dictionary representation of this config object,
        ignoring any private fields, and any optional fields that are None.

        Inner fields are returned as-is in the dictionary,
        meaning any nested config objects will be returned as config objects, not dictionaries.
        """

        def map_timedelta_field(field_key: str, value):
            field = self.__fields__.get(field_key)
            if field and issubclass(field.type_, DagsterConfigTimedelta):
                return value.total_seconds()
            return value

        return {k: map_timedelta_field(k, v) for k, v in super()._get_non_none_public_field_values().items()} 


@asset
def test_timedelta(context: OpExecutionContext, config: MyConfig):
    return context.log.info("test")

ndellosa95 avatar Jun 05 '23 13:06 ndellosa95

This workaround is very neat to see! Right now, the chief constraint to the Pydantic config layer is that it's translated to the older Dagster config system under the hood - as you're doing by extending ConfigScalar in your example.

I think it's not an unreasonable ask to introduce a more native way to tell Dagster which Dagster config type can be used to represent the field type, and a way to convert to and from that field type (e.g. providing timedelta <-> int conversion methods). This would mirror what the above example does but in a more compact way. I'll try to sketch out some examples for such an API.

benpankow avatar Jun 05 '23 20:06 benpankow

@ndellosa95 Thanks very much for sharing this!

@benpankow The issue in #14520 is potentially relevant here too as I imagine the solution for this would probably share a lot of surface area with that?

mjclarke94 avatar Jun 05 '23 20:06 mjclarke94

@benpankow is there a reason for why the older system is still running under the hood? limiting to dagster config types feels really arbitrary (no such limitations in fastapi or e.g. prefect), and makes it really painful to integrate with our current pydantic setup

marcinplatek avatar Apr 16 '24 11:04 marcinplatek

This workaround is very neat to see! Right now, the chief constraint to the Pydantic config layer is that it's translated to the older Dagster config system under the hood - as you're doing by extending ConfigScalar in your example.

I think it's not an unreasonable ask to introduce a more native way to tell Dagster which Dagster config type can be used to represent the field type, and a way to convert to and from that field type (e.g. providing timedelta <-> int conversion methods). This would mirror what the above example does but in a more compact way. I'll try to sketch out some examples for such an API.

@benpankow Did you ever do this? Do you all have plans to rip out the old type system or is that more of a 2.x feature? If not, I have a generic version of the above workaround I can contribute back to dagster's codebase.

ndellosa95 avatar Apr 25 '24 13:04 ndellosa95