confection icon indicating copy to clipboard operation
confection copied to clipboard

Upgrade Pydantic to v2

Open kabirkhan opened this issue 2 years ago • 3 comments
trafficstars

Overview

Adds support for using Pydantic v1 or v2.

All relevant Pydantic functions are extracted to utils that support Pydantic v1 and v2 style. CI Tests run with both Pydantic v1 and v2 but type checking focuses on Pydantic v2.

Besides the dependency upgrade there is 1 significant functionality change which fixes a bug where Pydantic models cannot currently be resolved from a registered function. More details in this comment: https://github.com/explosion/confection/pull/31/files#r1248253486

kabirkhan avatar Jun 12 '23 15:06 kabirkhan

I had this installed in the background for testing and have run into a few problems with pydantic v1.10 (since you can't use v2 with thinc/spacy right now):

$ spacy init config -p morphologizer /tmp/morphologizer.cfg
ℹ Generated config template specific for your use case
- Language: en
- Pipeline: morphologizer
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✘ Config validation error
nlp -> tokenizer	Promise(registry='tokenizers', name='spacy.Tokenizer.v1', args=[], kwargs={}) is not callable
{'lang': 'en', 'pipeline': ['tok2vec', 'morphologizer'], 'batch_size': 1000, 'disabled': [], 'before_creation': None, 'after_creation': None, 'after_pipeline_creation': None, 'tokenizer': {'@tokenizers': 'spacy.Tokenizer.v1'}}

Or with an existing config including a morphologizer with an internal tok2vec:

[components.morphologizer.model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"

[components.morphologizer.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.morphologizer.model.tok2vec.encode.width}
attrs = ["ORTH", "SHAPE"]
rows = [5000, 2500]
include_static_vectors = true
Config validation error
cfg.model.tok2vec -> embed	instance of Model expected
cfg.model.tok2vec -> encode	instance of Model expected
{'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v2', 'width': 256, 'attrs': ['ORTH', 'SHAPE'], 'rows': [5000, 2500], 'include_static_vectors': True}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 256, 'depth': 8, 'window_size': 1, 'maxout_pieces': 3}}

adrianeboyd avatar Jul 31 '23 09:07 adrianeboyd

I don't currently understand what the differences are between the unit tests with catalogue (that pass) and the usage with thinc/spacy and pydantic v1.10 that fail above.

Let me temporarily add a CI test to make sure it's not something on my end.

adrianeboyd avatar Aug 03 '23 15:08 adrianeboyd

I'm trying to think if there's a better way to define/manage the PYDANTIC_V2 setting so that it doesn't lead to so many typing errors...

adrianeboyd avatar Aug 04 '23 06:08 adrianeboyd