self-attentive-parser
self-attentive-parser copied to clipboard
Serializing the output
Hi there, thanks for putting together this awesome repo!
I'm wondering if it's possible to save the output somewhere on the disk (e.g., with pickle
or spacy
serialization methods).
For example:
import benepar, spacy
nlp = spacy.load('en_core_web_md')
nlp.add_pipe("benepar", config={"model": "benepar_en3"})
doc = nlp("The time for action is now. It's never too late to do something.")
fwn = "output.spacy"
doc.to_disk(fwn)
would yield the error:
/envs/lib/python3.7/site-packages/torch/distributions/distribution.py:46: UserWarning: <class 'torch_struct.distributions.TreeCRF'> does not define `arg_constraints`. Please set `arg_constraints = {}` or initialize the distribution with `validate_args=False` to turn off validation.
'with `validate_args=False` to turn off validation.')
Traceback (most recent call last):
File "/envs/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3552, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-2-acc55f65a9e7>", line 1, in <module>
runfile('/parsing/serialization.py', wdir='/parsing')
File "/home//.pycharm_helpers/pydev/_pydev_bundle/pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "/home//.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/parsing/serialization.py", line 15, in <module>
doc.to_disk(fwn)
File "spacy/tokens/doc.pyx", line 1270, in spacy.tokens.doc.Doc.to_disk
File "spacy/tokens/doc.pyx", line 1271, in spacy.tokens.doc.Doc.to_disk
File "spacy/tokens/doc.pyx", line 1298, in spacy.tokens.doc.Doc.to_bytes
File "spacy/tokens/doc.pyx", line 1357, in spacy.tokens.doc.Doc.to_dict
File "/envs/lib/python3.7/site-packages/spacy/util.py", line 1263, in to_dict
serialized[key] = getter()
File "spacy/tokens/doc.pyx", line 1354, in spacy.tokens.doc.Doc.to_dict.lambda19
File "/envs/lib/python3.7/site-packages/srsly/_msgpack_api.py", line 14, in msgpack_dumps
return msgpack.dumps(data, use_bin_type=True)
File "/envs/lib/python3.7/site-packages/srsly/msgpack/__init__.py", line 55, in packb
return Packer(**kwargs).pack(o)
File "srsly/msgpack/_packer.pyx", line 285, in srsly.msgpack._packer.Packer.pack
File "srsly/msgpack/_packer.pyx", line 291, in srsly.msgpack._packer.Packer.pack
File "srsly/msgpack/_packer.pyx", line 288, in srsly.msgpack._packer.Packer.pack
File "srsly/msgpack/_packer.pyx", line 264, in srsly.msgpack._packer.Packer._pack
File "srsly/msgpack/_packer.pyx", line 282, in srsly.msgpack._packer.Packer._pack
TypeError: can not serialize 'ConstituentData' object
Is there any workaround? This would be really useful when dealing with large datasets. Thanks for any guidance!