multihead-siamese-nets icon indicating copy to clipboard operation
multihead-siamese-nets copied to clipboard

Overfitting with CNN model?

Open datistiquo opened this issue 4 years ago • 4 comments

Hey,

I try the CNN model for my own data and I don't know what is going on there. I really hope you can get me some advices.

I use the model for sentences Matching for IR. I get good reuslts for the trained data but for out of scope I get very high confidences with not related sentences. Even for an empty string I get confidences of 1 for several sentences!

I have not so much data so I do augmenation. Do you have any recipe for the augmenation?

Thank you!

datistiquo avatar Aug 04 '19 12:08 datistiquo

Hi @datistiquo ,

sorry for the late response, have you tried any regularization techniques? and have you faced with overfitting only for CNNs?

Looking into the model configuration you can see that the dropout is disabled by default for CNNs, During the implementation i was not sure if dropout is a good regularization technique for this kinds of models (siamese-nets) so i disabled it by default. It is also possible that dropout can be useful but only for specific layers but i haven't investigated it.

The second important think that comes to my mind is the maximum length of training sequence. Imagine situation when you have a small training dataset and one or only several sentences are very long, like 50 tokens and the rest sentences are short (also those from tests). In this case short sentences are padded by a lot of placeholder tokens and it can be a strong signal in making the final decision. This area is also worth investigating.

I hope it will help, BR Tomasz

tlatkowski avatar Aug 18 '19 12:08 tlatkowski

I will check this.

I also think that the margin plays a huge role with contrastive loss.

Actually, have you normalized your word vectors before input? Maybe that is my issue too since I have not normalized them. maybe I try this out.

datistiquo avatar Oct 18 '19 13:10 datistiquo

Right now I use a simple MSE or simple contrastive loss. But I feel that I need to do a pairwise or triplet or even a listwise loss to do better?

Also, my metric to evaluae is just precision but ranking metric like precision at k is more reasonable for IR I think!

Frank-Sin99 avatar Oct 20 '19 17:10 Frank-Sin99

Hey @tlatkowski Why are you using in your CNN Network just the distance as output? Have you tried feeding the distance to a sigmoid layer? Or instead of using distance using directly the sigmoid layer?

Frank-Sin99 avatar Nov 03 '19 16:11 Frank-Sin99