pytorch-struct icon indicating copy to clipboard operation
pytorch-struct copied to clipboard

[Bug?] Positive probabilities in Alignment CRF when length nontrivial

Open tyang92 opened this issue 5 years ago • 10 comments

Hello! First thanks for this awesome project. I think it will make a big difference in the NLP community and I am already excited to use it in my work.

Problem TLDR: AlignmentCRF.log_prob returns values greater than 0 (marginal probabilities greater than 1. I think this might be a bug unless I'm misusing the API.

More background

I'm attempting to use an Alignment CRF (smith waterman) as the loss function for a generative model. The "true" labels are sequences of tokens of various length, and I like the alignment because it will still work if my model accidentally adds an extra token. Obviously my data is variable length, so I provide the lengths tensor argument which has the length of my true labels in each batch. When I do this and compute dist.log_prob(dist.argmax) I get some batches who have log likelihood which is very positive. I checked the partition function and it looks very negative, so my guess is the error might have to do with masking in the partition but not the score or vice-versa.

To reproduce this error

I get the same are in the CTC.ipynb example in the notebooks folder when I initialize the distribution with dist = torch_struct.AlignmentCRF(log_potentials, lengths=torch.Tensor([t] * 2 + [t-1] * 3).long()) and then dist.log_prob(dist.argmax). Image of the error is below. Here is a collab notebook which has the error https://colab.research.google.com/drive/1C1uDWNe8IcXB-Re6WcnwRsif-q-JQySe.

image

Do you agree this is a bug or is this the intended behavior? Any suggestions for how I might debug this or correctly use your API would be much appreciated. I am not familiar with SemiRings but if you explain what might fix the issue I can try to submit a pull request (still fairly new to pytorch so I'm not sure how much help I can be).

Thanks for your attention and again for the great library! Tim Y.

tyang92 avatar Jan 28 '20 00:01 tyang92

Yup, seems like a bug. It seems like it is not finding any valid solution at all when you set length < t.

Is there anyway you could avoid using lengths on alignment? The Alignment code is a bit gnarly, so I am guessing I just messed up the length constraint. An alternative method is just to pad your sequences with matches on the boundaries.

srush avatar Jan 28 '20 03:01 srush

Hello Dr. Rush,

Many thanks for your rapid response and diagnosis of the problem. My data is intrinsically variable length, so I appreciate your help thinking through alternatives to lengths while the bug stands.

If I understand your suggestion, you are recommending I create fake log_potentials by padding both my ground truth sequences and predicted logits with match values? How do I determine what the match value should be? (in your example you use the predicted logit at the true label index). Should I just use 0 so that exp(0) = 1 (certain match). How would this effect dist.argmax? And is it the case that if I follow this procedure exactly my log_marginal probability should not change (intuitively, it seems like this should change the partition function and therefore my marginal)

Thank you again! Tim Y.

tyang92 avatar Jan 28 '20 15:01 tyang92

Dear Dr. Rush,

First, thank you for writing the excellent arxiv explainer on this library! It very much helped me understanding. I have experimented with padding with matches, but I am not confident in my choice of log potentials as I still do not fully understand how this will effect the partition function and argmax. I wonder if you might confirm that 0 is the proper choice of log potential for certain match? Or perhaps it would be possible to write a wrapper which does this padding automatically for a given length constraint.

Either way, thank you and hope to hear from you soon. Best, Tim

tyang92 avatar Feb 19 '20 14:02 tyang92

Hi Tim,

Sorry for the slow reply. I started looking at the issue, but I am still not happy with the speed of this particular function and am worried it might not useful. Are you using it in a production system or a demo?

Either way I will try to get a padded demo this week, but I want to make sure you are not waiting on a function that will not be directly applicable.

srush avatar Feb 19 '20 16:02 srush

Hi Dr. Rush, Thanks so much for the info. My project is a research prototype so a big performance hit should be just fine if I can get something running at all.

I look forward to the padding demo, and once again very grateful for your help! Tim

tyang92 avatar Feb 19 '20 16:02 tyang92

Hi Dr. Rush and struct team,

Hope all is well with the virus and what not. I know it must be very hectic for you, but I wanted to check in on this. I'm now basically stuck at home and have a lot of time for this project and was hoping I could get a prototype of the alignment working.

Let me know if you have any updates or could point me in the right direction. Thank you! Tim

tyang92 avatar Mar 17 '20 12:03 tyang92

I'm so sorry, things have been a bit crazy. I will try to get to this this week.

srush avatar Mar 18 '20 15:03 srush

Hi, I finally got around to making you a full example:

https://github.com/harvardnlp/pytorch-struct/blob/master/notebooks/CTC_with_padding.ipynb

image

Let me know if it is clear. I will eventually get his checked in under the lengths command.

srush avatar Mar 23 '20 03:03 srush

This looks like exactly what I needed! Thank you SO much Dr. Rush, best wishes and stay safe

tyang92 avatar Mar 24 '20 17:03 tyang92

Awesome. I am going to leave it open so I remember to build it in.

srush avatar Mar 24 '20 17:03 srush