caml-mimic icon indicating copy to clipboard operation
caml-mimic copied to clipboard

padding, softmax, embeddings

Open ka-bu opened this issue 6 years ago • 5 comments

Hi,

I have two questions regarding the CAML implementation:

  1. All the texts in a batch are padded, but the input to the softmax function is not masked. Hence, this implementation also assigns positives attentions to padding tokens, right? Do I miss something here?
  2. The embedding vector that belongs to the padding tokens does not seem to be fixed to the zero vector. If not, then where is that constraint implemented? (I guess it wouldn't make a difference if 1. was handled differently, i.e. if the attentions for padding vectors would be fixed to 0).

Many thanks!

ka-bu avatar Nov 20 '18 17:11 ka-bu

Should be fixed from the above PR, although in my experience this doesn't really change the result.

sarahwie avatar Dec 10 '18 18:12 sarahwie

No, the PR doesn't fix everything. In my experience, fixing the embedding of the padding tokens does not change much, but masking the softmax input does.

ka-bu avatar Dec 10 '18 20:12 ka-bu

I see what you mean. I'll look into it.

sarahwie avatar Dec 10 '18 20:12 sarahwie

I have the same question here about taking softmax to compute attention weights. I rewrote my code to explicitly truncate each sample in the batch (quite inefficient). Some preliminary result shows about 3-4% drop for simple case of base CNN with 50 common labels. Would anyone be able to chime in on this issue? Thanks.

datduong avatar Jun 03 '19 22:06 datduong

This line here still does not use any masking https://github.com/jamesmullenbach/caml-mimic/blob/master/learn/models.py#L184 to compute weights.

datduong avatar Jun 03 '19 22:06 datduong