spacy-experimental
spacy-experimental copied to clipboard
MultiEmbed
Embedding component that is the deterministic version of MultiHashEmbed
i.e.: each token gets mapped to an index unless they are not in the vocabulary in which case they get mapped to a learned unknown vector.
The mechanism to initialize MultiEmbed
is a bit strange. The Model
gets created first with dummy Embed
layers. Then when init
gets called MultiEmbed
expects the model.attrs["tables"]
to be already set, which provides the mapping from token attributes to indices. During initialization the dummy Embed
layers get replaced by ones that adjust their sizes to the number of symbols in the tables
.
A helper callback is provided in set_attr.py
that should be placed in the initialize.before_init
section in the config
. It can be used to set the tables
for MultiEmbed
.
Currently the token_map.py
is a script that has the structure of the usual spacy init
scrips.