pytorch-transformers-classification icon indicating copy to clipboard operation
pytorch-transformers-classification copied to clipboard

Extensions

Open pythonometrist opened this issue 4 years ago • 13 comments

Thanks to your help - I have added custom losses, special initialization and a bunch of other things as extensions.

I am now trying to mess with the sentence classification model itself. It is a linear layer on top of the bert model. What I would like to do is a) freeze all of bert. b) add a cnn over and above. https://github.com/Shawn1993/cnn-text-classification-pytorch/blob/master/model.py

I ant to compare results with a fozen and unfrozen bert. Any pointers would be most appreciated.

pythonometrist avatar Sep 24 '19 18:09 pythonometrist

Should be pretty similar to adding custom losses. You can freeze all the layers by setting requires_grad = False for all of them in your subclassed model. You can add your convolutional layers to it as well, and define how you want them to be used in the forward method. Hopefully, it won't mess with loading the weights from the pretrained model. I don't think it will.

ThilinaRajapakse avatar Sep 24 '19 19:09 ThilinaRajapakse

Cool - let me try it out . While config.hidden_size is the size of the last layer from bert (and in some sense the size of my embedding, I guess I am struggling to figure out the size of vocabulary. It's probably the Bert vocabulary size hiding somewhere in the config. max_seq_length is user specified so we already can assume padded sequences.Agreed the rest is carefully initializing the model and writing up the forward correctly... (which might be non trivial for me!) Let me get back to you. Thanks.

pythonometrist avatar Sep 24 '19 19:09 pythonometrist

If it doesn't work, you can always decouple BERT and the CNN and just feed the BERT outputs to the CNN.

I'm no expert myself, but you seem to be doing fine to me!

ThilinaRajapakse avatar Sep 24 '19 19:09 ThilinaRajapakse

Well - I got a model to work with some simple linear layers. So that is progress. I need to work out tensor sizes - bert is sending out tensors (64x768) - where 64 is batch size. I assume for each sentence I am receiving once embedding of size 768. I 've got to figure out how to go from there to a Vocabulary x Document matrix - I think it means that somewhere BERT is averaging over the words. OR I simply ned to forget about word embeddings and simply do a 1D convolution at the document level....will think some more and update.

pythonometrist avatar Sep 24 '19 23:09 pythonometrist

You da boss. Yep can do all sorts of models once you realize they offer up access to all layers to convolve /lstm over. I am curious if you know about the apex installation - one seems to be pure python vs the other uses c compiler - which one do you use?

pythonometrist avatar Sep 26 '19 15:09 pythonometrist

Great!

I use the Apex version with C++ extensions. The pure python version is lacking a few features. I don't see any reason not to use the C++ version.

ThilinaRajapakse avatar Sep 26 '19 15:09 ThilinaRajapakse

I am having some issue with apex on a debian server....well fingers crossed. Thanks for all the input! i had been wanting to get into pytorch for a while and now I am in!

pythonometrist avatar Sep 27 '19 00:09 pythonometrist

Odd. I never had issues with any Ubuntu based distros.

Welcome to Pytorch!

ThilinaRajapakse avatar Sep 27 '19 02:09 ThilinaRajapakse

Thanks - its a server which is stuck on pip 8.1 . But looks like i could get it to wirk with conda. fingers crossed.

pythonometrist avatar Sep 27 '19 05:09 pythonometrist

Ok it works with conda!!! - should apex batchnorm 32 be True? and O1 vs O2 - which way worked for you?

pythonometrist avatar Sep 28 '19 01:09 pythonometrist

I don't think I changed batchnorm. Doesn't it get set when you change the opt level? I used opt 1. Opt 2 was giving me NaN losses.

ThilinaRajapakse avatar Sep 28 '19 02:09 ThilinaRajapakse

Defaults for this optimization level are: enabled : True opt_level : O1 cast_model_type : None patch_torch_functions : True keep_batchnorm_fp32 : None master_weights : None loss_scale : dynamic

Tht is the default when I run the models - not sure if that should be something else.keep_batchnorm_fp32 : None , I'll dig around and report.

pythonometrist avatar Sep 28 '19 03:09 pythonometrist

Yeah, I just kept the defaults there.

ThilinaRajapakse avatar Sep 28 '19 03:09 ThilinaRajapakse