pydantic-core icon indicating copy to clipboard operation
pydantic-core copied to clipboard

Add `serialize_as_any` config flag + runtime support

Open sydney-runkle opened this issue 1 year ago • 5 comments

Contributing to https://github.com/pydantic/pydantic/issues/6423

TODO:

  • [x] Add support for dataclasses
  • [x] Add support for TypedDict
  • [x] Add much more comprehensive testing
  • [ ] Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)
  • [ ] Ask follow up questions about field_name = None
  • [ ] Resolve questions about extra usage, why we use mut in model.rs

sydney-runkle avatar Feb 15 '24 18:02 sydney-runkle

CodSpeed Performance Report

Merging #1194 will improve performances by 43.79%

Comparing ser_any_support (ffbbd34) with main (71d54a2)

Summary

⚡ 6 improvements ✅ 142 untouched benchmarks

🆕 4 new benchmarks

Benchmarks breakdown

Benchmark main ser_any_support Change
test_complete_core_json 2.6 ms 2.1 ms +21.47%
test_core_json_fs 733.4 µs 538.9 µs +36.11%
test_dict_of_ints_core_json 13.3 ms 10.1 ms +32.41%
🆕 test_enum_int_core N/A 20.4 µs N/A
🆕 test_enum_int_python N/A 39.1 µs N/A
🆕 test_enum_str_core N/A 21.1 µs N/A
🆕 test_enum_str_python N/A 38.6 µs N/A
test_list_of_ints_core_json 5.1 ms 3.9 ms +31.22%
test_set_of_ints_core_json 5.9 ms 4.7 ms +23.71%
test_set_of_ints_core_json_duplicates 3.6 ms 2.5 ms +43.79%

codspeed-hq[bot] avatar Feb 15 '24 18:02 codspeed-hq[bot]

Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)

Did you remove that from this PR? Sounds like it's probably orthogonal and might be irrelevant with the recent changes samuel and I made.

Ask follow up questions about field_name = None

? :)

Resolve questions about extra usage, why we use mut in model.rs

If you mean for the extra, IIRC the mut is a temporary fudge to make it possible to use that extra with the recursion guard. We should be refactoring the serialisation state.

davidhewitt avatar Feb 29 '24 16:02 davidhewitt

Resolutions from my conversation with @davidhewitt:

  • For now, we do need this at the model, dataclass, and field level, but notably, the field case is only for TypedDict cases
  • The recursion guard logic from Adrian's PR has been removed, as we think it's no longer necessary with more recent recursion guard changes
  • DH and I think that it could be beneficial to use a more generic type base class for the various model, dataclass, TypedDict serializers in the future, as there's definitely some replication that could be removed
  • The field_name stuff hasn't been changed, though it could be revisited in the future
  • The mut questions I had about extra stuff have also been resolved with my changes

sydney-runkle avatar Mar 12 '24 19:03 sydney-runkle

I see that maybe that was addressed in other discussion with DH. I have reviewed and don't have any objections, I'd feel better if @davidhewitt also approved though.

dmontagu avatar Mar 13 '24 20:03 dmontagu

So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected:

def test_serialize_as_any_list_types() -> None:
    serializer = SchemaSerializer(core_schema.list_schema(core_schema.int_schema()))

    assert serializer.to_python(['a', 'b', 'c'], serialize_as_any=True) == ['a', 'b', 'c']

The annotated code which I'd argue is equivalent doesn't warn:

from pydantic import TypeAdapter, SerializeAsAny

TypeAdapter(SerializeAsAny[list[int]]).dump_python(["a", "b", "c"])

davidhewitt avatar Mar 14 '24 11:03 davidhewitt

So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected

I think it makes sense to still warn here in general. SerializeAsAny if I understand correctly is more about how we handle subclasses

The annotated code which I'd argue is equivalent doesn't warn

Hmm, well in this case, we're actually using an any schema, hence the lack of warning. I think we could address this separately in pydantic when we modify the schema for a type wrapped in SerializeAsAny...

@adriangb at one point suggested:

if we make SerializeAsAny generate it's schema as some sort of union of infer and use this static schema it should work for all cases. SerializeAsAny[SecretStr] would use the static schema unless you pass in a subclass of str that has a serializer and SerializeAsAny[SomeModel] would always use the instance's serializer.

sydney-runkle avatar Mar 14 '24 12:03 sydney-runkle

I agree that SerializeAsAny is intended to be used for subclasses, but it does allow arbitrary type inference and my point above is that the annotation is still behaving differently to the runtime flag. I think users might find this subtle / surprising. It makes the behaviour something they have to discover empirically rather than easily reason about.

That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.

davidhewitt avatar Mar 14 '24 13:03 davidhewitt

I came here to ask the same question @davidhewitt asked about list[int] and ['a'] (also worth checking ['1']).

Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.

Which does mean maybe we should have a DuckTypeSerialization[T] annotation.

adriangb avatar Mar 14 '24 17:03 adriangb

Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.

Which does mean maybe we should have a DuckTypeSerialization[T] annotation.

@adriangb I think that this was really well said and carefully thought out. @davidhewitt, are you ok with this change? I think it allows us to sidestep the surprising inconsistencies with SerializeAsAny, while also moving forward with these changes!

sydney-runkle avatar Mar 15 '24 12:03 sydney-runkle

I can see the purity in this argument but I worry it adds complexity to users by having "duck typed serialization" and "serialize as any" as separate concepts. Especially when the intended use for both of them is to serialize subtypes in full, and neither of them as implemented behave like what I'd expect from "duck typing".

I would much prefer we just lived with this being called serialize_as_any and we worked to unify it with the annotation.

davidhewitt avatar Mar 15 '24 15:03 davidhewitt

@davidhewitt,

That makes sense, so perhaps we should proceed with this approach?

That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.

sydney-runkle avatar Mar 17 '24 01:03 sydney-runkle

@sydney-runkle do you think we could make the annotation behave more like this?

adriangb avatar Mar 17 '24 02:03 adriangb

@adriangb,

I think we would probably have to go the other way, that is, being more lax on the serialize_as_any runtime flag side of things. I've created this issue to track the discrepancies in behavior, and ultimately nudge us towards resolving those differences: https://github.com/pydantic/pydantic/issues/9049.

sydney-runkle avatar Mar 19 '24 12:03 sydney-runkle

Just waiting on https://github.com/pydantic/pydantic-core/pull/1236 to fix the bug mentioned in the most recent test that I wrote

sydney-runkle avatar Mar 21 '24 13:03 sydney-runkle