pydantic-core
pydantic-core copied to clipboard
Add `serialize_as_any` config flag + runtime support
Contributing to https://github.com/pydantic/pydantic/issues/6423
TODO:
- [x] Add support for dataclasses
- [x] Add support for TypedDict
- [x] Add much more comprehensive testing
- [ ] Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)
- [ ] Ask follow up questions about field_name = None
- [ ] Resolve questions about extra usage, why we use
mutin model.rs
CodSpeed Performance Report
Merging #1194 will improve performances by 43.79%
Comparing ser_any_support (ffbbd34) with main (71d54a2)
Summary
⚡ 6 improvements
✅ 142 untouched benchmarks
🆕 4 new benchmarks
Benchmarks breakdown
| Benchmark | main |
ser_any_support |
Change | |
|---|---|---|---|---|
| ⚡ | test_complete_core_json |
2.6 ms | 2.1 ms | +21.47% |
| ⚡ | test_core_json_fs |
733.4 µs | 538.9 µs | +36.11% |
| ⚡ | test_dict_of_ints_core_json |
13.3 ms | 10.1 ms | +32.41% |
| 🆕 | test_enum_int_core |
N/A | 20.4 µs | N/A |
| 🆕 | test_enum_int_python |
N/A | 39.1 µs | N/A |
| 🆕 | test_enum_str_core |
N/A | 21.1 µs | N/A |
| 🆕 | test_enum_str_python |
N/A | 38.6 µs | N/A |
| ⚡ | test_list_of_ints_core_json |
5.1 ms | 3.9 ms | +31.22% |
| ⚡ | test_set_of_ints_core_json |
5.9 ms | 4.7 ms | +23.71% |
| ⚡ | test_set_of_ints_core_json_duplicates |
3.6 ms | 2.5 ms | +43.79% |
Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)
Did you remove that from this PR? Sounds like it's probably orthogonal and might be irrelevant with the recent changes samuel and I made.
Ask follow up questions about field_name = None
? :)
Resolve questions about extra usage, why we use mut in model.rs
If you mean for the extra, IIRC the mut is a temporary fudge to make it possible to use that extra with the recursion guard. We should be refactoring the serialisation state.
Resolutions from my conversation with @davidhewitt:
- For now, we do need this at the model, dataclass, and field level, but notably, the field case is only for TypedDict cases
- The recursion guard logic from Adrian's PR has been removed, as we think it's no longer necessary with more recent recursion guard changes
- DH and I think that it could be beneficial to use a more generic type base class for the various model, dataclass, TypedDict serializers in the future, as there's definitely some replication that could be removed
- The
field_namestuff hasn't been changed, though it could be revisited in the future - The
mutquestions I had aboutextrastuff have also been resolved with my changes
I see that maybe that was addressed in other discussion with DH. I have reviewed and don't have any objections, I'd feel better if @davidhewitt also approved though.
So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected:
def test_serialize_as_any_list_types() -> None:
serializer = SchemaSerializer(core_schema.list_schema(core_schema.int_schema()))
assert serializer.to_python(['a', 'b', 'c'], serialize_as_any=True) == ['a', 'b', 'c']
The annotated code which I'd argue is equivalent doesn't warn:
from pydantic import TypeAdapter, SerializeAsAny
TypeAdapter(SerializeAsAny[list[int]]).dump_python(["a", "b", "c"])
So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected
I think it makes sense to still warn here in general. SerializeAsAny if I understand correctly is more about how we handle subclasses
The annotated code which I'd argue is equivalent doesn't warn
Hmm, well in this case, we're actually using an any schema, hence the lack of warning. I think we could address this separately in pydantic when we modify the schema for a type wrapped in SerializeAsAny...
@adriangb at one point suggested:
if we make SerializeAsAny generate it's schema as some sort of union of infer and use this static schema it should work for all cases. SerializeAsAny[SecretStr] would use the static schema unless you pass in a subclass of str that has a serializer and SerializeAsAny[SomeModel] would always use the instance's serializer.
I agree that SerializeAsAny is intended to be used for subclasses, but it does allow arbitrary type inference and my point above is that the annotation is still behaving differently to the runtime flag. I think users might find this subtle / surprising. It makes the behaviour something they have to discover empirically rather than easily reason about.
That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.
I came here to ask the same question @davidhewitt asked about list[int] and ['a'] (also worth checking ['1']).
Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.
Which does mean maybe we should have a DuckTypeSerialization[T] annotation.
Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.
Which does mean maybe we should have a DuckTypeSerialization[T] annotation.
@adriangb I think that this was really well said and carefully thought out. @davidhewitt, are you ok with this change? I think it allows us to sidestep the surprising inconsistencies with SerializeAsAny, while also moving forward with these changes!
I can see the purity in this argument but I worry it adds complexity to users by having "duck typed serialization" and "serialize as any" as separate concepts. Especially when the intended use for both of them is to serialize subtypes in full, and neither of them as implemented behave like what I'd expect from "duck typing".
I would much prefer we just lived with this being called serialize_as_any and we worked to unify it with the annotation.
@davidhewitt,
That makes sense, so perhaps we should proceed with this approach?
That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.
@sydney-runkle do you think we could make the annotation behave more like this?
@adriangb,
I think we would probably have to go the other way, that is, being more lax on the serialize_as_any runtime flag side of things. I've created this issue to track the discrepancies in behavior, and ultimately nudge us towards resolving those differences: https://github.com/pydantic/pydantic/issues/9049.
Just waiting on https://github.com/pydantic/pydantic-core/pull/1236 to fix the bug mentioned in the most recent test that I wrote