cog icon indicating copy to clipboard operation
cog copied to clipboard

Add support for Pydantic 2

Open mattt opened this issue 1 year ago • 2 comments

Rebase of #1687, applied to main instead of async.

Providing context, unhelpfully, in the style of baroque legaleses rehearsals:


THIS PR has been made on this day to bump Cog's supported range of Pydantic to >=1.9,<3

WHEREAS Pydantic is a data validation library that had major API changes from V1 to V2, and mixing of the Pydantic V1 and V2 models is not supported;

WHEREAS FastAPI and other Pydantic dependents use the V2 API when available, thereby precluding use of the v1 compatibility shim;

WITHNESSETH THAT:

  1. A PYDANTIC_V2 constant is defined to support V1 and V2 APIs.
  2. The OpenAPI specification generated by newer versions of Pydantic + FastAPI requires manual intervention to retain the existing content and structure.
  3. Usage of the dict method has been deprecated in V2, in favor of the model_dump method
  4. The API for Field has been altered in V2 such that:
    1. regex has been renamed to pattern
    2. choices has been removed in favor of Literal typing
    3. extras has been renamed to json_schema_extras
  5. The API for BaseModel has been altered in V2 such that:
    1. model_config method determines configuration instead of a nested Config class.
    2. A regime of __get_pydantic_core_schema__ and __get_pydantic_json_schema__ determines serialization instead of the __get_validators__ and __modify_scheme__ methods.
  6. Any no longer have default value of None. So we make it Optional and set a default value of None to get validation / schema to work correctly.
  7. Values of type io.IOBase and serialized by V2 pydantic_core as a generator, thereby wrapping them in a pydantic_core._pydantic_core.SerializationIterator, an object that cannot be pickled and requires unwrapping before being passed between multiprocessing boundaries. [^1]

NOW THEREFORE, in consideration of the mutual covenants herein contained, it is agreed by and between the parties.

[^1]: Proper upstream fixes for this have been proposed by @yorickvP with https://github.com/pydantic/pydantic-core/pull/1399 and https://github.com/pydantic/pydantic-core/pull/1401.

mattt avatar Aug 06 '24 17:08 mattt

Before merging, we should update CI to test against both Pydantic v1 and v2

mattt avatar Aug 12 '24 10:08 mattt

Submitted a PR to pydantic-core to expose .iterator on SerializationIterator: https://github.com/pydantic/pydantic-core/pull/1399

yorickvP avatar Aug 12 '24 11:08 yorickvP

Would it be worth it to additionally typecheck with pydantic v1?

yorickvP avatar Sep 06 '24 13:09 yorickvP

@yorickvP Yes, indeed! If you have time before next week, it'd be great to add that to the tox matrix.

mattt avatar Sep 06 '24 15:09 mattt

@mattt I might have time later, but don't wait for me! We can merge that later.

yorickvP avatar Sep 12 '24 15:09 yorickvP

my test with python_dependencies: - pydantic>2 produced a pip install with fastapi==0.98 and pydantic==2.9.2, which wouldn't work. Not sure what to do about that, except re-specifying fastapi constraints.

yorickvP avatar Sep 19 '24 19:09 yorickvP

We will need to rethink how cog build installs dependencies.

  • Currently, there is a pip install cog step. This installs cog in the image, and picks pydantic2.
  • (base image is built here)
  • Then, user dependencies are installed. If this includes pydantic < 2, pip won't actually downgrade pydantic. It would have to run pip uninstall pydantic first.

Possible solutions:

  1. Depend on pydantic1 in the pip install cog step.

    • This upgrades to pydantic2 as needed during user package installation.
    • Pros: Can work for users using pydantic 1 and 2.
    • Con: This defaults to pydantic 1, while we want to switch to 2.
  2. Make the cog wheel participate in the user's dependency resolution.

    • This means we add /tmp/cog-something.whl to the requirements.txt and run the package installation as a single step.
    • Pro: dependency sets are always valid. Users get clear errors when there's a conflicting dependency.
    • Con: Stops us from including cog in the base images
    • Neutral: Stops users from building packages where dependencies conflict with cog. Not sure if that's desired.
  3. Add a pydantic: specifier to cog.yaml

    • Pro: makes it easy to have multiple base images
    • Con: not obvious for users when this is needed
    • Con: not backwards compatible
  4. Infer pydantic version from dependencies.

    • We already do this with torch.
    • Pro: makes it easy to have multiple base images
    • Con: not backwards compatible
    • Con: pydantic dependencies are often transitive, so this will require extra work for users
  5. Only allow building new images with pydantic2

    • Keep pydantic1 compat for existing images, but remove the functionality from cog go.
    • Pro: easiest to do (current behavior in this PR)
    • Con: Confusing to users (pydantic1 dependencies will break at runtime)
    • Con: not backwards compatible
  6. Switch to something smarter than pip

    • Out of scope for this PR

yorickvP avatar Sep 20 '24 09:09 yorickvP

I went with option 1, which is the easiest for now. We can defer making pydantic2 a default to a later PR.

yorickvP avatar Sep 20 '24 10:09 yorickvP

IMO that's the correct choice at least for the moment: keep things as they are now, but make it so that vllm doesn't break

technillogue avatar Sep 20 '24 17:09 technillogue