python-corenlp-protobuf edu.stanford.nlp.pipeline.Document vs. stanfordnlp.protobuf.Document

Hi, forgive me for asking stupid question(s) but, why are these two classes named the same thing but function differently and use different naming conventions? 🙃

Why does the CoreNLP Server return stanfordnlp.protobuf.Document while the pipeline returns edu.stanford.nlp.pipeline.Document? Are there any helper functions that can convert between the two?

Lastly, is the source of the dependency (in the protobuf.Document) the same as the governor of the dependency (in the pipeline.Document) or is it the dependent?

Appreciate your time and all the work ya'll have done!! :)

Jun 24 '20 00:06 scottagt

Hi Scott, I'm a little confused -- where did you receive a class of type stanfordnlp.protobuf.Document? Is this when using the stanfordnlp Python package? This object is a serialized protobuf object which is an efficient binary representation is shared between Python and Java. The edu.stanford.nlp.pipeline.Document is a plain old Java object that is used with Stanford's CoreNLP java package; stanza (the new name for stanfordnlp) has a similar wrapper.

Jun 24 '20 02:06 arunchaganty

Hi Arun, I've been experimenting with a couple of libraries as I'm trying to do some multiprocessing with the parse command. I originally started with the standfordnlp package but found when I called the pipeline __call__ method on a text within a worker in a Pool that it would re-load all the models and that would waste a lot of time. And then I came across a post by someone else who was attempting to do a similar thing and found a workaround using the CoreNLP server and making async request calls and using partial: https://github.com/stanfordnlp/stanza/issues/177 And then I did some experimenting on my own and found that using that means with the parse command produced stanfordnlp.protobuf.Document (which this repo is focused on protobuf stuff). Is it the case that the stanfordnlp.protobuf.Document class (which is generated by protobuf) is derived from the work in this repo? Do you know why there are differences in the attributes in each of the classes? I thought you guys might have familiarity with the stanfordnlp python package as well and could explain how the protobuf class came to be different than the one generated from pipeline. Any information about that would be awesome; appreciate your time.

Jun 24 '20 03:06 scottagt

Apologies; yes the protobufs in this package and the ones in stanfordnlp (or stanza) are the same: both are derived from the protobuf definitions in CoreNLP. If there are differences in attributes, it's probably because this package is slightly out of date. I'll mark that as a TODO for me.

On Tue, Jun 23, 2020 at 8:39 PM Scott A [email protected] wrote:

Hi Arun, I've been experimenting with a couple of libraries as I'm trying to do some multiprocessing with the parse command. I originally started with the standfordnlp package but found when I called the pipeline call method on a text within a worker in a Pool that it would re-load all the models and that would waste a lot of time. And then I came across a post by someone else who was attempting to do a similar thing and found a workaround using the CoreNLP server and making async request calls and using partial: stanfordnlp/stanza#177 https://github.com/stanfordnlp/stanza/issues/177 And then I did some experimenting on my own and found that using that means with the parse command produced stanfordnlp.protobuf.Document (which this repo is focused on protobuf stuff). Is it the case that the stanfordnlp.protobuf.Document class (which is generated by protobuf) is derived from the work in this repo? Do you know why there are differences in the attributes in each of the classes? I thought you guys might have familiarity with the stanfordnlp python package as well and could explain how the protobuf class came to be different than the one generated from pipeline. Any information about that would be awesome; appreciate your time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/python-corenlp-protobuf/issues/4#issuecomment-648563804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAU5SK3YFYMTFFVJSIAXG3RYFYPNANCNFSM4OGEWC4Q .

-- Arun Tejasvi Chaganty http://arun.chagantys.org/

Jun 25 '20 17:06 arunchaganty

Oh, okay great! Also, can you confirm— does “source” refer to the dependent and “target” refer to the governor? It seems the the Dependency objects are also different than the stanfordnlp package and I cannot find any documentation stating the comparisions/contrasts. In fact, if you, or parties you think are more relevant, could produce some comparison table for the different Document/Sentence/Dependency objects between what stanfordnlp and stanza vs. the protfobuf generated classes are, that would be very awesome and useful for many users I bet. :) Maybe post it on all the related GitHub projects / FAQ pages or something.

On Jun 25, 2020, at 1:23 PM, Arun Tejasvi Chaganty <[email protected]mailto:[email protected]> wrote:

Apologies; yes the protobufs in this package and the ones in stanfordnlp (or stanza) are the same: both are derived from the protobuf definitions in CoreNLP. If there are differences in attributes, it's probably because this package is slightly out of date. I'll mark that as a TODO for me.

On Tue, Jun 23, 2020 at 8:39 PM Scott A <[email protected]mailto:[email protected]> wrote:

Hi Arun, I've been experimenting with a couple of libraries as I'm trying to do some multiprocessing with the parse command. I originally started with the standfordnlp package but found when I called the pipeline call method on a text within a worker in a Pool that it would re-load all the models and that would waste a lot of time. And then I came across a post by someone else who was attempting to do a similar thing and found a workaround using the CoreNLP server and making async request calls and using partial: stanfordnlp/stanza#177 https://github.com/stanfordnlp/stanza/issues/177 And then I did some experimenting on my own and found that using that means with the parse command produced stanfordnlp.protobuf.Document (which this repo is focused on protobuf stuff). Is it the case that the stanfordnlp.protobuf.Document class (which is generated by protobuf) is derived from the work in this repo? Do you know why there are differences in the attributes in each of the classes? I thought you guys might have familiarity with the stanfordnlp python package as well and could explain how the protobuf class came to be different than the one generated from pipeline. Any information about that would be awesome; appreciate your time.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/python-corenlp-protobuf/issues/4#issuecomment-648563804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAU5SK3YFYMTFFVJSIAXG3RYFYPNANCNFSM4OGEWC4Q .

-- Arun Tejasvi Chaganty http://arun.chagantys.org/

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/stanfordnlp/python-corenlp-protobuf/issues/4#issuecomment-649715196, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGFWLUCFQQKCWTOH2ZJOJDLRYOBZ7ANCNFSM4OGEWC4Q.

Jun 25 '20 17:06 scottagt

python-corenlp-protobuf python-corenlp-protobuf copied to clipboard

edu.stanford.nlp.pipeline.Document vs. stanfordnlp.protobuf.Document

python-corenlp-protobuf
python-corenlp-protobuf copied to clipboard