hedwig icon indicating copy to clipboard operation
hedwig copied to clipboard

update char_cnn and fasttext

Open gmichalo opened this issue 4 years ago • 3 comments

By examining the great models that you have created, I saw that the code in char_cnn (Character-level Convolutional Networks for TextClassification) is not exactly the same as the model that the authors described in their paper.

In particular, in this pull request I added:

a)For the char_cnn model: 1)I changed the last (sixth) convolutional networks and the first linear layer of the char_cnn to the format that the author used. 2) I wrote the equation (line 33,34) for calculating the input size of the first linear layer based on the maximum number of characters of a sentence (this number can be added to the config file but I did not want to change more files so I just I added it as a constant in the model.py)

b)For the FastText model:

I change the F.avg_pool2d(x, (x.shape[1], 1)).squeeze(1) to F.avg_pool1d(x, x.shape[1]).squeeze(1) which are equivalent but I believe that by using F.avg_pool1d will make the code more understandable as the code right now uses F.avg_pool2d to basically do avg_pool1d.

gmichalo avatar Jul 04 '20 14:07 gmichalo

Thanks for the PR. Have you run any benchmarks?

daemon avatar Aug 31 '20 13:08 daemon

For the fasttext model, I tested the model on Reuters and the AAPD dataset and the new version get the exact same results as the previous version as it was expected.

For the char_cnn model the main difference between the new version and the old version is the input dimension of the first linear layer which in the old version was fixed to 256 but in the new version is calculated (as it was mentioned in the paper) and in the new version of the model is 8448 for Reuters (lines 33-36 on the code and last paragraph before section 2.4 in the paper). Due to this change when I tried to run on Reuters I got 0 f1 on dev due to the limited data. However, if we follow the paper I believe this is the correct way to create the char_cnn

gmichalo avatar Sep 01 '20 22:09 gmichalo

Okay, it's probably a good idea to note in some readme that the correct implementation of char_cnn produces an F1 of 0 on Reuters, so people don't waste their time running it.

daemon avatar Sep 04 '20 22:09 daemon