anthropic-sdk-python icon indicating copy to clipboard operation
anthropic-sdk-python copied to clipboard

Feature-request: Pydantic validators+serializers to be able to round-trip all supported types

Open charles-dyfis-net opened this issue 1 year ago • 11 comments

Right now, pydantic can't instantiate a TypeAdapter for anthropic.types.MessageParam on account of the support for file-like objects (which, by nature, can't be serialized to JSON) in the data types used for image support. Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam) will throw a PydanticSchemaGenerationError, because typing.IO[bytes] can't be represented as a pydantic_core schema.

If instead of using TypedDict the various classes were implemented as dataclasses (ideally using pydantic.dataclasses.dataclass), or were implemented using subclasses of pydantic.BaseModel, these classes could define custom serializers to convert into JSON-representable data -- for example, serializing a file-like object by actually reading its content into memory.

charles-dyfis-net avatar Jun 27 '24 00:06 charles-dyfis-net

Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam)

Can you share more about your use-case? What are you trying to do?

cc @RobertCraigie

rattrayalex avatar Jun 27 '24 04:06 rattrayalex

Attempting to instantiate a pydantic.TypeAdapter(anthropic.types.MessageParam)

Can you share more about your use-case? What are you trying to do?

Sure -- I'm trying to serialize pending LLM requests to JSON to put them in a work queue, with a consumer for each backend able to execute them (one per Bedrock region with the appropriate model, one for Anthropic first-party, &c) and then deserialize and run those requests.

charles-dyfis-net avatar Jun 27 '24 11:06 charles-dyfis-net

Rather than serialising the entire object, if it's a file could you not store in an s3 or R2 bucket and serialize the url and just add that to your queue.

lingster avatar Jul 04 '24 03:07 lingster

Rather than serialising the entire object, if it's a file could you not store in an s3 or R2 bucket and serialize the url and just add that to your queue.

I don't actually need to serialize file-like objects.

Thing is, that doesn't matter: Because file-like objects are possible in a MessageParam, I can't instantiate a pydantic.TypeAdapter for MessageParam instances; Pydantic wants to be able to build a JSONSchema description of the type, so as long as there's something in the union that can't be represented in JSONSchema, the TypeAdapter instantiation fails during introspection before ever looking at the individual instance and what values are or aren't present.

That's the point of adding a serializer that replaces those objects with their content: the act of doing so will make messages serializable in practice even if they don't use the option to have a file handle attached, and it'll do so losslessly (in a way that lets folks use the Anthropic API and Pydantic together in a way that's natural to each and adds no extra configuration or dependencies); perhaps a bit inefficient compared to S3 or R2, but someone who cares about that inefficiency and is willing to add new service dependencies can add their own code to store content out-of-band as they see fit.

charles-dyfis-net avatar Jul 04 '24 05:07 charles-dyfis-net

@charles-dyfis-net can you share a full example of the code you'd like to be able to write, and what you have to do today?

rattrayalex avatar Jul 06 '24 21:07 rattrayalex

Have you looked at our .to_json() helpers? Do they help at all?

rattrayalex avatar Jul 06 '24 21:07 rattrayalex

Have you looked at our .to_json() helpers? Do they help at all?

I haven't; if there exist corresponding from_json() helpers to be able to round-trip back to an object, that would be exactly what I need.

charles-dyfis-net avatar Jul 06 '24 22:07 charles-dyfis-net

mmm, I think something like Message.from_json('{"foo": …}') could make sense!

FWIW, I'd expect this to internally look roughy like this:

data = json.loads(…)
return Message.build(**data)

care to give that a try and see how it goes for you?

rattrayalex avatar Jul 06 '24 22:07 rattrayalex

Thank you -- I'll do that, hopefully within the next few days. (I'd still prefer to see Pydantic's (de)serialization work out-of-the-box, so folks don't need to implement logic specific to the Anthropic SDK, but if this does in fact work as advertised that reduces the priority / pain level significantly).

charles-dyfis-net avatar Jul 06 '24 22:07 charles-dyfis-net

Great, let me know what you find!

rattrayalex avatar Jul 07 '24 21:07 rattrayalex

Hi @charles-dyfis-net, in the next release you'll be able to use MessageParam with TypeAdapters :)

Image params won't serialise properly yet as we haven't defined a custom serialiser to handle file inputs, will have more to share on that front soon.

RobertCraigie avatar Jul 10 '24 13:07 RobertCraigie