confection
confection copied to clipboard
Upgrade Pydantic to v2
Overview
Adds support for using Pydantic v1 or v2.
All relevant Pydantic functions are extracted to utils that support Pydantic v1 and v2 style. CI Tests run with both Pydantic v1 and v2 but type checking focuses on Pydantic v2.
Besides the dependency upgrade there is 1 significant functionality change which fixes a bug where Pydantic models cannot currently be resolved from a registered function. More details in this comment: https://github.com/explosion/confection/pull/31/files#r1248253486
I had this installed in the background for testing and have run into a few problems with pydantic v1.10 (since you can't use v2 with thinc/spacy right now):
$ spacy init config -p morphologizer /tmp/morphologizer.cfg
ℹ Generated config template specific for your use case
- Language: en
- Pipeline: morphologizer
- Optimize for: efficiency
- Hardware: CPU
- Transformer: None
✘ Config validation error
nlp -> tokenizer Promise(registry='tokenizers', name='spacy.Tokenizer.v1', args=[], kwargs={}) is not callable
{'lang': 'en', 'pipeline': ['tok2vec', 'morphologizer'], 'batch_size': 1000, 'disabled': [], 'before_creation': None, 'after_creation': None, 'after_pipeline_creation': None, 'tokenizer': {'@tokenizers': 'spacy.Tokenizer.v1'}}
Or with an existing config including a morphologizer with an internal tok2vec:
[components.morphologizer.model.tok2vec]
@architectures = "spacy.Tok2Vec.v2"
[components.morphologizer.model.tok2vec.embed]
@architectures = "spacy.MultiHashEmbed.v2"
width = ${components.morphologizer.model.tok2vec.encode.width}
attrs = ["ORTH", "SHAPE"]
rows = [5000, 2500]
include_static_vectors = true
Config validation error
cfg.model.tok2vec -> embed instance of Model expected
cfg.model.tok2vec -> encode instance of Model expected
{'@architectures': 'spacy.Tok2Vec.v2', 'embed': {'@architectures': 'spacy.MultiHashEmbed.v2', 'width': 256, 'attrs': ['ORTH', 'SHAPE'], 'rows': [5000, 2500], 'include_static_vectors': True}, 'encode': {'@architectures': 'spacy.MaxoutWindowEncoder.v2', 'width': 256, 'depth': 8, 'window_size': 1, 'maxout_pieces': 3}}
I don't currently understand what the differences are between the unit tests with catalogue (that pass) and the usage with thinc/spacy and pydantic v1.10 that fail above.
Let me temporarily add a CI test to make sure it's not something on my end.
I'm trying to think if there's a better way to define/manage the PYDANTIC_V2 setting so that it doesn't lead to so many typing errors...