spaCy
spaCy copied to clipboard
[Documentation] Serializing Pipeline unclear
Summary
On this page, it claims to serialize a pipeline, you use the following methods:
config = nlp.config
bytes_data = nlp.to_bytes()
and that you you must take care of storing both and then loading from disk.
However, it also appears that:
nlp.to_disk('directory_name')
coupled with:
spacy.load('directory_name')
works and this is a lot more simple. The code executes and I can call a built nlp object on text successfully.
Questions
-
Does this approach actually work identically?
- If so, can we update the documentation? The
nlp.configandto_bytesseem like implementation details rather than the API for serializing? - I didn't see a mention on this page that you can load the persisted pipeline from disk with
spacy.load, should this be added?
- If so, can we update the documentation? The
-
If this approach doesn't work, I think we should call this out and build a function/method that handles loading and saving to disk with a single call - this seems better than having to write your own disk persistence for the config and bytes object. What do you think?
Thanks!
Which page or section is this issue related to?
https://spacy.io/usage/saving-loading