spaCy Merge the parser refactor into `v4`

Description

This is a large refactor of the parsing model, started by @honnibal. The refactor turns the paring model into a more canonical Thinc model. As a result, GPU training performance is much-improved, since backpropagation can be done for all states at once, rather than per parsing step.

Types of change

Enhancement.

Checklist

[x] I confirm that I have the right to submit this contribution under the project's MIT license.
[x] I ran the tests, and all new and existing tests passed.
[x] My changes don't require a change to the documentation, or if they do, I've added all required information.

Jun 09 '22 12:06 danieldk

I think we can now remove some of the commented out lines that reference the v2 parser, seeing how we won't be porting it back to spacy-legacy?

Jun 24 '22 10:06 shadeMe

I think we can now remove some of the commented out lines that reference the v2 parser, seeing how we won't be porting it back to spacy-legacy?

Removed all the references to the old versions.

Jun 24 '22 10:06 danieldk

When did you make the decision not to have legacy support for spacy.TransitionParser.v1 and spacy.TransitionParser.v2? This is extremely breaking.

We can deal with having the output be slightly different given that the factory behavior is versioned on the spacy version (so parser+v3.3.0 may not produce the same output as parser+v3.4.0 with identical configs), but I think we should try extremely hard not to break existing v3 configs in v4.

Sep 02 '22 07:09 adrianeboyd

When did you make the decision not to have legacy support for spacy.TransitionParser.v1 and spacy.TransitionParser.v2? This is extremely breaking.

First the idea was to bring back the older versions, possibly as part of spacy-legacy. IIRC we discussed this with Matt and he thought it'd be best to just remove the old version.

Since there are changes to the layout of the parameters, etc. it'd be hard to support the v1/v2 versions with the new code base. So, we'll have to fork what is now in spaCy 3 to keep the older versions around.

Sep 02 '22 07:09 danieldk

I don't care about the model compatibility, just the config compatibility. So this continues to work:

[components.parser]
factory = "parser"
learn_tokens = false
min_action_freq = 30
moves = null
scorer = {"@scorers":"spacy.parser_scorer.v1"}
update_with_oracle_cut_size = 100

[components.parser.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "parser"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.parser.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode:width}
upstream = "tok2vec"

Sep 02 '22 07:09 adrianeboyd

We had a similar situation for the entity_linker, where a new implementation of the component wasn't compatible anymore with older config versions. What we decided at the time is to keep the old code around in spacy.pipeline.legacy and to have a "hack" in the factory to decide when to use this legacy version: https://github.com/explosion/spaCy/blob/master/spacy/pipeline/entity_linker.py#L102

It's ugly, but does mean the old configs can still be used. I also imagine that the legacy parser would probably entail a lot more than just the one Python file like we have for the entity_linker 😬

Sep 06 '22 13:09 svlandeg

It's ugly, but does mean the old configs can still be used. I also imagine that the legacy parser would probably entail a lot more than just the one Python file like we have for the entity_linker 😬

I'll try and see how far we can get. I think configuration-wise the largest difference is that in the new parser we always use the upper layer. So I guess that means that if the user specifies use_upper = false, we have to emit a warning and use the upper layer anyway. But there may be other differences that I am not thinking of now.

Sep 06 '22 13:09 danieldk

I have added back spacy.TransitionParser.v2 and (besides adding it to the tests) tried training a model from a configuration generated using the quickstart.

spacy.TransitionParser.v1 lives in spacy-legacy, so I think we have to update that version in legacy later.

I am not sure what to do about:

spacy.PrecomputableAffine.v1, since this is not a separate model anymore.
spacy.TransitionModel.v1, the model and arguments changed a fair bit.

Though I think these are rarely used directly in spaCy model configurations?

Sep 27 '22 10:09 danieldk

I am not sure what to do about:

* `spacy.PrecomputableAffine.v1`, since this is not a separate model anymore.

* `spacy.TransitionModel.v1`, the model and arguments changed a fair bit.

_precomputable_affine.py could just be moved in its entirety to spacy-legacy, no? It doesn't have any internal dependencies I think?

Similarly, can't we just move the the code for spacy.TransitionModel.v1 to legacy?

It would be good to start that PR for spacy-legacy while wrapping up this PR, then we can also bump legacy's version here.

Oct 03 '22 14:10 svlandeg