python-corenlp-protobuf
python-corenlp-protobuf copied to clipboard
edu.stanford.nlp.pipeline.Document vs. stanfordnlp.protobuf.Document
Hi, forgive me for asking stupid question(s) but, why are these two classes named the same thing but function differently and use different naming conventions? 🙃
Why does the CoreNLP Server return stanfordnlp.protobuf.Document while the pipeline returns edu.stanford.nlp.pipeline.Document? Are there any helper functions that can convert between the two?
Lastly, is the source of the dependency (in the protobuf.Document) the same as the governor of the dependency (in the pipeline.Document) or is it the dependent?
Appreciate your time and all the work ya'll have done!! :)
Hi Scott,
I'm a little confused -- where did you receive a class of type stanfordnlp.protobuf.Document
? Is this when using the stanfordnlp
Python package? This object is a serialized protobuf object which is an efficient binary representation is shared between Python and Java. The edu.stanford.nlp.pipeline.Document
is a plain old Java object that is used with Stanford's CoreNLP java package; stanza
(the new name for stanfordnlp
) has a similar wrapper.
Hi Arun, I've been experimenting with a couple of libraries as I'm trying to do some multiprocessing with the parse command. I originally started with the standfordnlp package but found when I called the pipeline __call__
method on a text within a worker in a Pool that it would re-load all the models and that would waste a lot of time. And then I came across a post by someone else who was attempting to do a similar thing and found a workaround using the CoreNLP server and making async request calls and using partial: https://github.com/stanfordnlp/stanza/issues/177 And then I did some experimenting on my own and found that using that means with the parse command produced stanfordnlp.protobuf.Document (which this repo is focused on protobuf stuff). Is it the case that the stanfordnlp.protobuf.Document class (which is generated by protobuf) is derived from the work in this repo? Do you know why there are differences in the attributes in each of the classes? I thought you guys might have familiarity with the stanfordnlp python package as well and could explain how the protobuf class came to be different than the one generated from pipeline. Any information about that would be awesome; appreciate your time.
Apologies; yes the protobufs in this package and the ones in stanfordnlp
(or stanza
) are the same: both are derived from the protobuf definitions
in CoreNLP
. If there are differences in attributes, it's probably because
this package is slightly out of date. I'll mark that as a TODO for me.
On Tue, Jun 23, 2020 at 8:39 PM Scott A [email protected] wrote:
Hi Arun, I've been experimenting with a couple of libraries as I'm trying to do some multiprocessing with the parse command. I originally started with the standfordnlp package but found when I called the pipeline call method on a text within a worker in a Pool that it would re-load all the models and that would waste a lot of time. And then I came across a post by someone else who was attempting to do a similar thing and found a workaround using the CoreNLP server and making async request calls and using partial: stanfordnlp/stanza#177 https://github.com/stanfordnlp/stanza/issues/177 And then I did some experimenting on my own and found that using that means with the parse command produced stanfordnlp.protobuf.Document (which this repo is focused on protobuf stuff). Is it the case that the stanfordnlp.protobuf.Document class (which is generated by protobuf) is derived from the work in this repo? Do you know why there are differences in the attributes in each of the classes? I thought you guys might have familiarity with the stanfordnlp python package as well and could explain how the protobuf class came to be different than the one generated from pipeline. Any information about that would be awesome; appreciate your time.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/python-corenlp-protobuf/issues/4#issuecomment-648563804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAU5SK3YFYMTFFVJSIAXG3RYFYPNANCNFSM4OGEWC4Q .
-- Arun Tejasvi Chaganty http://arun.chagantys.org/
Oh, okay great! Also, can you confirm— does “source” refer to the dependent and “target” refer to the governor? It seems the the Dependency objects are also different than the stanfordnlp package and I cannot find any documentation stating the comparisions/contrasts. In fact, if you, or parties you think are more relevant, could produce some comparison table for the different Document/Sentence/Dependency objects between what stanfordnlp and stanza vs. the protfobuf generated classes are, that would be very awesome and useful for many users I bet. :) Maybe post it on all the related GitHub projects / FAQ pages or something.
On Jun 25, 2020, at 1:23 PM, Arun Tejasvi Chaganty <[email protected]mailto:[email protected]> wrote:
Apologies; yes the protobufs in this package and the ones in stanfordnlp
(or stanza
) are the same: both are derived from the protobuf definitions
in CoreNLP
. If there are differences in attributes, it's probably because
this package is slightly out of date. I'll mark that as a TODO for me.
On Tue, Jun 23, 2020 at 8:39 PM Scott A <[email protected]mailto:[email protected]> wrote:
Hi Arun, I've been experimenting with a couple of libraries as I'm trying to do some multiprocessing with the parse command. I originally started with the standfordnlp package but found when I called the pipeline call method on a text within a worker in a Pool that it would re-load all the models and that would waste a lot of time. And then I came across a post by someone else who was attempting to do a similar thing and found a workaround using the CoreNLP server and making async request calls and using partial: stanfordnlp/stanza#177 https://github.com/stanfordnlp/stanza/issues/177 And then I did some experimenting on my own and found that using that means with the parse command produced stanfordnlp.protobuf.Document (which this repo is focused on protobuf stuff). Is it the case that the stanfordnlp.protobuf.Document class (which is generated by protobuf) is derived from the work in this repo? Do you know why there are differences in the attributes in each of the classes? I thought you guys might have familiarity with the stanfordnlp python package as well and could explain how the protobuf class came to be different than the one generated from pipeline. Any information about that would be awesome; appreciate your time.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/python-corenlp-protobuf/issues/4#issuecomment-648563804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAU5SK3YFYMTFFVJSIAXG3RYFYPNANCNFSM4OGEWC4Q .
-- Arun Tejasvi Chaganty http://arun.chagantys.org/
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/stanfordnlp/python-corenlp-protobuf/issues/4#issuecomment-649715196, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGFWLUCFQQKCWTOH2ZJOJDLRYOBZ7ANCNFSM4OGEWC4Q.