spaCy Store activations in `Doc`s when `store

Description

This change adds the new activations attribute to Doc. This attribute can be used by trainable pipes to store their activations, probabilities, and guesses for downstream users.

As an example, this change modifies the tagger and senter pipes to add an store_activations option. When this option is enabled, the probabilities and guesses are stored in set_annotations.

Edit: out of draft.

Types of change

Proposal.

Checklist

[x] I confirm that I have the right to submit this contribution under the project's MIT license.
[x] I ran the tests, and all new and existing tests passed.
[ ] My changes don't require a change to the documentation, or if they do, I've added all required information.

Jun 22 '22 08:06 danieldk

Adding a field to Doc is breaking, so I think this would need to wait until v4?

Jun 27 '22 13:06 adrianeboyd

Adding a field to Doc is breaking, so I think this would need to wait until v4?

Yeah. I was going to rebase this to v4, but we first need to update v4 to avoid that all the commits end up in this PR:

https://github.com/explosion/spaCy/pull/11034

Jun 27 '22 13:06 danieldk

Something went awry with the merge, will fix this later.

Jun 27 '22 14:06 danieldk

Something went awry with the merge, will fix this later.

Order restored :).

Jun 27 '22 17:06 danieldk

Fixed merge conflicts.

Aug 01 '22 07:08 danieldk

I think that set_store_activations is a really confusing name. Are you setting? Are you storing? Are you setting "store activations"? What are "store activations"? Are they "stored activations"?

Aug 05 '22 08:08 adrianeboyd

I think that set_store_activations is a really confusing name. Are you setting? Are you storing? Are you setting "store activations"? What are "store activations"? Are they "stored activations"?

😁

set_persistent_activations and maybe use the name "persistent" in other places as well?

Aug 05 '22 11:08 svlandeg

set_persistent_activations and maybe use the name "persistent" in other places as well?

set_persisted_activations or maybe set_saved_activations?

Aug 05 '22 11:08 danieldk

Isn't it more along the lines of set_activations_to_store or set_activations_to_save or set_store_activations_setting?

Aug 08 '22 09:08 adrianeboyd

Isn't it more along the lines of set_activations_to_store or set_activations_to_save or set_store_activations_setting?

Changed to save_activations. Since it can now only be enabled/disabled, it is a regular property again (no set_.*). So, this functionality can now be enabled with:

pipe.save_activations = True

Aug 30 '22 08:08 danieldk

Note: tests are failing in MyPy because we need Thinc with https://github.com/explosion/thinc/pull/739.

Aug 30 '22 09:08 danieldk

Please allow me to ask this :trollface: question. Isn't the store_activations mechanism kind of a temporary stand-in for a more general GlobalListener that is kinda like an ActivationPool that components read from, write to and clear? Like the global memory of the pipeline is the Doc and this would be the global memory for the model part of the pipeline? Isn't that the kind of stuff we are missing? I guess ideally any later model could query and tensor produced by earlier models and backprop should be supported through this ActivationPool. Something like this? We've discussed something like this with @danieldk for a second at some point.

Sep 07 '22 14:09 kadarakos

Please allow me to ask this :trollface: question. Isn't the store_activations mechanism kind of a temporary stand-in for a more general GlobalListener that is kinda like an ActivationPool that components read from, write to and clear? Like the global memory of the pipeline is the Doc and this would be the global memory for the model part of the pipeline? Isn't that the kind of stuff we are missing? I guess ideally any later model could query and tensor produced by earlier models and backprop should be supported through this ActivationPool.

But in my use case, it is not a model that uses the activations, but a pipe. Also, I think activations could be useful to downstream applications that are not pipes or models (e.g. some custom code that the user wrote). So, it seems most logical to associate them with a Doc? Otherwise the Language.pipe method should also return other things than Docs.

Sep 07 '22 14:09 danieldk

Please allow me to ask this :trollface: question. Isn't the store_activations mechanism kind of a temporary stand-in for a more general GlobalListener that is kinda like an ActivationPool that components read from, write to and clear? Like the global memory of the pipeline is the Doc and this would be the global memory for the model part of the pipeline? Isn't that the kind of stuff we are missing? I guess ideally any later model could query and tensor produced by earlier models and backprop should be supported through this ActivationPool.

But in my use case, it is not a model that uses the activations, but a pipe. Also, I think activations could be useful to downstream applications that are not pipes or models (e.g. some custom code that the user wrote). So, it seems most logical to associate them with a Doc? Otherwise the Language.pipe method should also return other things than Docs.

Okok I see (we can talk about it later just don't want to spam more :D )

Sep 07 '22 16:09 kadarakos

spaCy
spaCy copied to clipboard

Store activations in `Doc`s when `store_activations` is enabled

Description

Types of change

Checklist

spaCy spaCy copied to clipboard

Store activations in `Doc`s when `store_activations` is enabled

Description

Types of change

Checklist

spaCy
spaCy copied to clipboard