pydantic icon indicating copy to clipboard operation
pydantic copied to clipboard

Union type of `Enum` and primitive type always gives primitive type

Open lmmx opened this issue 2 years ago • 5 comments

Initial Checks

  • [X] I confirm that I'm using Pydantic V2

Description

e.g. Union[IntEnum, int] always gives int, enums with string values in unions with str types always give str, etc.

I tried lots of ways of defining a Union type where an Enum is one option and the Enum values' primitive type is the other option. None of these attempts worked, so I thought I'd ask here in case I'm missing something obvious or not so obvious.

i.e. the union type never uses the enum, even though when you take the option of the primitive int type out of the union it validates to it.

A Literal works instead of an enum, but Enums are documented as if they're valid for representing multiple choices, so I'm surprised that this isn't working. If I have to write a manual field validator it feels like the type annotation machinery that Pydantic uses so well is going to waste.

This also prevents 'splitting' models based on a union annotation: as shown here, Literal would let you choose a different model to handle binary numbers but an Enum will never be used so would effectively produce a single 'un-split' model.

from enum import Enum
from typing import Literal, Union

from pydantic import RootModel, TypeAdapter


class BinaryEnum(Enum):
    ZERO = 0
    ONE = 1


class EnumRootModel(RootModel):
    root: BinaryEnum


class LiteralRootModel(RootModel):
    root: Literal[0, 1]


LiteralUnion = Union[LiteralRootModel, int]
EnumUnion = Union[EnumRootModel, int]

print("Using Literal root model:")
print(TypeAdapter(list[LiteralUnion]).validate_python([1, 2]))

print()

print("Using Enum root model:")
print(TypeAdapter(list[EnumUnion]).validate_python([1, 2]))
Using Literal root model:
[LiteralRootModel(root=1), 2]

Using Enum root model:
[1, 2]
  • Using a RootModel with int-type root produces the same result

I find this difference in behaviour counterintuitive to the point that I'm inclined to think it's a bug.

When I used datamodel-code-generator my memory was that you could choose between Literals and Enums interchangeably there, so this might be impacting models produced there I'm not sure.

Example Code

from enum import Enum
from typing import Union

from pydantic import TypeAdapter


class Binary(Enum):
    x = 0
    y = 1


union_fwd = Union[Binary, int]
union_rev = Union[int, Binary]

val = 1

enum_result = TypeAdapter(Binary).validate_python(val)
fwd_result = TypeAdapter(union_fwd).validate_python(val)
rev_result = TypeAdapter(union_rev).validate_python(val)
int_result = TypeAdapter(int).validate_python(val)

assert fwd_result == rev_result == int_result == val
assert enum_result == Binary.y

Python, Pydantic & OS Version

pydantic version: 2.1.1
        pydantic-core version: 2.4.0
          pydantic-core build: profile=release pgo=true mimalloc=true
                 install path: /home/louis/miniconda3/envs/pydanticv2/lib/python3.11/site-packages/pydantic
               python version: 3.11.4 (main, Jul  5 2023, 13:45:01) [GCC 11.2.0]
                     platform: Linux-5.15.0-43-generic-x86_64-with-glibc2.35
     optional deps. installed: ['typing-extensions']

Selected Assignee: @hramezani

lmmx avatar Aug 13 '23 15:08 lmmx

I saw it noted elsewhere by @adriangb that:

v2 does multiple passes on unions. the first without coercion and if none of the options match it does another pass with coercion

Would that description maybe explain this case? Is the first pass trying to match str type, and finding a str in an enum value, so not resorting to the 2nd pass to coerce the str value to Enum?

I’m not familiar with the internals of this routine so don’t know where to look to confirm or not.

  • Perhaps _internal._std_types_schema’s get_enum_core_schema()
    • Maybe it needs a validator with __get_pydantic_core_schema__ (like some of the other types listed there)?

This bug might also be phrased as “strict coercion is always used for Union of Enum and the Enum value’s type (or subclassed type)” if so.

  • The “subclassed type” here meaning int for IntEnum, str for StrEnum, but the same effect is seen when using a regular Enum.
  • Indeed I expect you could have a regular Enum with both int and str values, and a Union with int and str would likewise resolve to the primitive types.

lmmx avatar Aug 15 '23 09:08 lmmx

I will aim to improve this in https://github.com/pydantic/pydantic-core/pull/867

(I will see if it's possible to prefer the Enum class if possible.)

davidhewitt avatar Aug 15 '23 09:08 davidhewitt

This is now fixed on main, thanks @davidhewitt (I'm guessing an old fix from https://github.com/pydantic/pydantic-core/pull/867).

sydney-runkle avatar Mar 27 '24 04:03 sydney-runkle

Maybe I understand something wrong here. But I am running into the same issue and am not getting the impression that it was fixed. I am using pydantic 2.7.4.

from enum import Enum
from typing import Union
import pydantic

print(pydantic.VERSION)


class FooEnum(Enum):
    A = 1
    B = 2


class BarModel(pydantic.BaseModel):
    foo: Union[FooEnum, int]


bar = BarModel(foo=1)

print(bar.foo)
print(type(bar.foo))
print(bar.model_dump())

And the output is:

2.7.4
1
<class 'int'>
{'foo': 1}

TheDelus avatar Jun 14 '24 11:06 TheDelus

Reopening, I think this was a regression with our migration of enum validators to rust!

sydney-runkle avatar Jun 28 '24 17:06 sydney-runkle

Reopening, I think this was a regression with our migration of enum validators to rust!

Blessed are the backlog revivers :pray:

lmmx avatar Sep 09 '24 19:09 lmmx

Will be trying to rewrite our enum validator in rust for v2.10 :) to fix these regressions / discrepancies with earlier versions!

sydney-runkle avatar Sep 10 '24 14:09 sydney-runkle

Circling back here as I make changes to the union / literal / enum validators.

I think this behavior is correct - it makes sense to me that we prioritize primitives - they're more of an exact match based on our exact / strict / lax match scoring system.

See below for another example:

from pydantic import TypeAdapter
from enum import Enum
from typing import Literal

class MyEnum(str, Enum):
     FOO = 'foo'
     BAR = 'bar'

ta = TypeAdapter(MyEnum | Literal['foo'])
assert ta.validate_python('foo') == 'foo'
assert ta.validate_python(MyEnum.FOO) == 'foo'
assert ta.validate_python('bar') == MyEnum.BAR

Going to mark this as not planned for now.

sydney-runkle avatar Sep 20 '24 13:09 sydney-runkle