berkeley-entity icon indicating copy to clipboard operation
berkeley-entity copied to clipboard

Mention pair pruning

Open joecheriross opened this issue 9 years ago • 6 comments

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pairs after pruning.

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

      for(i <- 0 until docGraphs.size){
            println("PRUNED EDGES");

           for(j1<-0 until docGraphs(i).prunedEdges.size) {

               for(j2<-0 until docGraphs(i).prunedEdges(j1).size)

                       if(docGraphs(i).prunedEdges(j1)(j2) == true){

                          println(j1 + " " + docGraphs(i).getMention(j1).words + ": " + j2 + " " + docGraphs(i).getMention(j2).words);

                  }

           }

      }

  }

joecheriross avatar Dec 29 '15 16:12 joecheriross

Yes, that should be the case. Note that i is the index of the current mention and j is the index of the antecedent (so j < i, with j == i denoting the mention starting a new cluster).

Greg

On Tue, Dec 29, 2015 at 11:46 AM, Joe Cheri Ross [email protected] wrote:

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pair which after pruning

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

  for(i <- 0 until docGraphssize){
        println("PRUNED EDGES");

       for(j1<-0 until docGraphs(i)prunedEdgessize) {

           for(j2<-0 until docGraphs(i)prunedEdges(j1)size)

                   if(docGraphs(i)prunedEdges(j1)(j2) == true){

                      println(j1 + " " + docGraphs(i)getMention(j1)words + ": " + j2 + " " + docGraphs(i)getMention(j2)words);

              }

       }

  }

}

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6.

gregdurrett avatar Dec 30 '15 03:12 gregdurrett

Thank you Greg. One more doubt. The command line I am using has -pruningStrategy pointing to a corefprune model file. What does this mean ? Pruning is learned and stored as a model ? For my purpose I am extending the distance pruning. Is this ok ?

On Wed, Dec 30, 2015 at 8:49 AM, Greg Durrett [email protected] wrote:

Yes, that should be the case. Note that i is the index of the current mention and j is the index of the antecedent (so j < i, with j == i denoting the mention starting a new cluster).

Greg

On Tue, Dec 29, 2015 at 11:46 AM, Joe Cheri Ross <[email protected]

wrote:

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pair which after pruning

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

for(i <- 0 until docGraphssize){ println("PRUNED EDGES");

for(j1<-0 until docGraphs(i)prunedEdgessize) {

for(j2<-0 until docGraphs(i)prunedEdges(j1)size)

if(docGraphs(i)prunedEdges(j1)(j2) == true){

println(j1 + " " + docGraphs(i)getMention(j1)words + ": " + j2 + " " + docGraphs(i)getMention(j2)words);

}

}

}

}

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6.

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167929501 .

joecheriross avatar Dec 30 '15 04:12 joecheriross

Yes. That method of using pruning prunes according to the marginals of a pre-trained model. I mostly used it for pruning in more sophisticated stuff like the full entity system. For coref-only stuff, I only use the basic distance pruning (which in reality doesn't prune at all) so that should be fine to extend.

Greg

On Tue, Dec 29, 2015 at 11:40 PM, Joe Cheri Ross [email protected] wrote:

Thank you Greg. One more doubt. The command line I am using has -pruningStrategy pointing to a corefprune model file. What does this mean ? Pruning is learned and stored as a model ? For my purpose I am extending the distance pruning. Is this ok ?

On Wed, Dec 30, 2015 at 8:49 AM, Greg Durrett [email protected] wrote:

Yes, that should be the case. Note that i is the index of the current mention and j is the index of the antecedent (so j < i, with j == i denoting the mention starting a new cluster).

Greg

On Tue, Dec 29, 2015 at 11:46 AM, Joe Cheri Ross < [email protected]

wrote:

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pair which after pruning

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

for(i <- 0 until docGraphssize){ println("PRUNED EDGES");

for(j1<-0 until docGraphs(i)prunedEdgessize) {

for(j2<-0 until docGraphs(i)prunedEdges(j1)size)

if(docGraphs(i)prunedEdges(j1)(j2) == true){

println(j1 + " " + docGraphs(i)getMention(j1)words + ": " + j2 + " " + docGraphs(i)getMention(j2)words);

}

}

}

}

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6.

— Reply to this email directly or view it on GitHub < https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167929501

.

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167934568 .

gregdurrett avatar Dec 31 '15 01:12 gregdurrett

Thanks Greg. I will do that.

Sharing one observation. While experimenting with the pretrained model on my test data, I found that many of the required mention pairs are getting pruned. I have not verified this thoroughly. But I am almost sure that this is happening.

Thanks, Joe

On Thu, Dec 31, 2015 at 7:27 AM, Greg Durrett [email protected] wrote:

Yes. That method of using pruning prunes according to the marginals of a pre-trained model. I mostly used it for pruning in more sophisticated stuff like the full entity system. For coref-only stuff, I only use the basic distance pruning (which in reality doesn't prune at all) so that should be fine to extend.

Greg

On Tue, Dec 29, 2015 at 11:40 PM, Joe Cheri Ross <[email protected]

wrote:

Thank you Greg. One more doubt. The command line I am using has -pruningStrategy pointing to a corefprune model file. What does this mean ? Pruning is learned and stored as a model ? For my purpose I am extending the distance pruning. Is this ok ?

On Wed, Dec 30, 2015 at 8:49 AM, Greg Durrett [email protected] wrote:

Yes, that should be the case. Note that i is the index of the current mention and j is the index of the antecedent (so j < i, with j == i denoting the mention starting a new cluster).

Greg

On Tue, Dec 29, 2015 at 11:46 AM, Joe Cheri Ross < [email protected]

wrote:

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pair which after pruning

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

for(i <- 0 until docGraphssize){ println("PRUNED EDGES");

for(j1<-0 until docGraphs(i)prunedEdgessize) {

for(j2<-0 until docGraphs(i)prunedEdges(j1)size)

if(docGraphs(i)prunedEdges(j1)(j2) == true){

println(j1 + " " + docGraphs(i)getMention(j1)words + ": " + j2 + " " + docGraphs(i)getMention(j2)words);

}

}

}

}

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6.

— Reply to this email directly or view it on GitHub <

https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167929501

.

— Reply to this email directly or view it on GitHub < https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167934568

.

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-168109709 .

joecheriross avatar Dec 31 '15 02:12 joecheriross

There should be some line printed (starting with the word "Pruning" I think) that tells you about this. Many gold arcs are pruned but the model is pretty good about not deleting every gold arc from a mention (as in, some gold arc should be preserved >90% of the time). And those preserved gold arcs are the ones that are picked anyway (e.g. close links for pronouns), so from the standpoint of the downstream model this is okay.

Greg

On Wed, Dec 30, 2015 at 9:02 PM, Joe Cheri Ross [email protected] wrote:

Thanks Greg. I will do that.

Sharing one observation. While experimenting with the pretrained model on my test data, I found that many of the required mention pairs are getting pruned. I have not verified this thoroughly. But I am almost sure that this is happening.

Thanks, Joe

On Thu, Dec 31, 2015 at 7:27 AM, Greg Durrett [email protected]

wrote:

Yes. That method of using pruning prunes according to the marginals of a pre-trained model. I mostly used it for pruning in more sophisticated stuff like the full entity system. For coref-only stuff, I only use the basic distance pruning (which in reality doesn't prune at all) so that should be fine to extend.

Greg

On Tue, Dec 29, 2015 at 11:40 PM, Joe Cheri Ross < [email protected]

wrote:

Thank you Greg. One more doubt. The command line I am using has -pruningStrategy pointing to a corefprune model file. What does this mean ? Pruning is learned and stored as a model ? For my purpose I am extending the distance pruning. Is this ok ?

On Wed, Dec 30, 2015 at 8:49 AM, Greg Durrett < [email protected]> wrote:

Yes, that should be the case. Note that i is the index of the current mention and j is the index of the antecedent (so j < i, with j == i denoting the mention starting a new cluster).

Greg

On Tue, Dec 29, 2015 at 11:46 AM, Joe Cheri Ross < [email protected]

wrote:

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pair which after pruning

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

for(i <- 0 until docGraphssize){ println("PRUNED EDGES");

for(j1<-0 until docGraphs(i)prunedEdgessize) {

for(j2<-0 until docGraphs(i)prunedEdges(j1)size)

if(docGraphs(i)prunedEdges(j1)(j2) == true){

println(j1 + " " + docGraphs(i)getMention(j1)words + ": " + j2 + " " + docGraphs(i)getMention(j2)words);

}

}

}

}

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6.

— Reply to this email directly or view it on GitHub <

https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167929501

.

— Reply to this email directly or view it on GitHub <

https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167934568

.

— Reply to this email directly or view it on GitHub < https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-168109709

.

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-168110253 .

gregdurrett avatar Dec 31 '15 02:12 gregdurrett

Ok got it. The point is though many gold arcs get deleted, final accuracy is not much affected since the essential ones are preserved.

Thanks, Joe

On Thu, Dec 31, 2015 at 7:38 AM, Greg Durrett [email protected] wrote:

There should be some line printed (starting with the word "Pruning" I think) that tells you about this. Many gold arcs are pruned but the model is pretty good about not deleting every gold arc from a mention (as in, some gold arc should be preserved >90% of the time). And those preserved gold arcs are the ones that are picked anyway (e.g. close links for pronouns), so from the standpoint of the downstream model this is okay.

Greg

On Wed, Dec 30, 2015 at 9:02 PM, Joe Cheri Ross [email protected]

wrote:

Thanks Greg. I will do that.

Sharing one observation. While experimenting with the pretrained model on my test data, I found that many of the required mention pairs are getting pruned. I have not verified this thoroughly. But I am almost sure that this is happening.

Thanks, Joe

On Thu, Dec 31, 2015 at 7:27 AM, Greg Durrett [email protected]

wrote:

Yes. That method of using pruning prunes according to the marginals of a pre-trained model. I mostly used it for pruning in more sophisticated stuff like the full entity system. For coref-only stuff, I only use the basic distance pruning (which in reality doesn't prune at all) so that should be fine to extend.

Greg

On Tue, Dec 29, 2015 at 11:40 PM, Joe Cheri Ross < [email protected]

wrote:

Thank you Greg. One more doubt. The command line I am using has -pruningStrategy pointing to a corefprune model file. What does this mean ? Pruning is learned and stored as a model ? For my purpose I am extending the distance pruning. Is this ok ?

On Wed, Dec 30, 2015 at 8:49 AM, Greg Durrett < [email protected]> wrote:

Yes, that should be the case. Note that i is the index of the current mention and j is the index of the antecedent (so j < i, with j == i denoting the mention starting a new cluster).

Greg

On Tue, Dec 29, 2015 at 11:46 AM, Joe Cheri Ross < [email protected]

wrote:

Hi Greg,

When prunedEdges(i)(j) is true, does that mean the mention pair ith mention and jth mention is ignored(or avoided from further processing) ? I got confused when I printed the mention pair which after pruning

code snippet

def printPrunedEdges(docGraphs:Seq[DocumentGraph])= {

for(i <- 0 until docGraphssize){ println("PRUNED EDGES");

for(j1<-0 until docGraphs(i)prunedEdgessize) {

for(j2<-0 until docGraphs(i)prunedEdges(j1)size)

if(docGraphs(i)prunedEdges(j1)(j2) == true){

println(j1 + " " + docGraphs(i)getMention(j1)words + ": " + j2 + " " + docGraphs(i)getMention(j2)words);

}

}

}

}

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6.

— Reply to this email directly or view it on GitHub <

https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167929501

.

— Reply to this email directly or view it on GitHub <

https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-167934568

.

— Reply to this email directly or view it on GitHub <

https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-168109709

.

— Reply to this email directly or view it on GitHub < https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-168110253

.

— Reply to this email directly or view it on GitHub https://github.com/gregdurrett/berkeley-entity/issues/6#issuecomment-168110579 .

joecheriross avatar Dec 31 '15 02:12 joecheriross