lightning-bolts icon indicating copy to clipboard operation
lightning-bolts copied to clipboard

Integrate bolts + torch hub

Open edenlightning opened this issue 4 years ago • 19 comments

edenlightning avatar Dec 10 '20 16:12 edenlightning

well, we can set the Bolts as models to register, but still, for getting weights we need some heavy GPU machines...

Borda avatar Dec 23 '20 00:12 Borda

Can we do vice-versa too ?

  1. Load a model from torch.hub.
  2. train / finetune with PyTorch Lightning.

I would be highly interested in implementing such feature.

oke-aditya avatar Jan 07 '21 14:01 oke-aditya

hi, I would like to help with this issue.

With best regards, Ranuga

Programmer-RD-AI avatar Oct 24 '21 01:10 Programmer-RD-AI

hi, I would like to help with this issue.

Great! Let's sync up also with the Bolts refactoring =)

Borda avatar Oct 24 '21 06:10 Borda

Just for information, currently there is a refactor in torchvision.models going on available in prototype folder.

So the API with the hub might change.

Edit: Also a small note, torchvision detection models do not work with Hub.

Let me know if I can help.

P.S. A book on PyTorch Lightning will be out end of this year!

oke-aditya avatar Oct 24 '21 06:10 oke-aditya

I will start working on this.

:)

With best regards, Ranuga

Programmer-RD-AI avatar Oct 24 '21 07:10 Programmer-RD-AI

Hi, I want to know what the issue is to use a torch model in PyTorch Lightning and Fine Tune?

With best regards, Ranuga

Programmer-RD-AI avatar Oct 25 '21 04:10 Programmer-RD-AI

Torch hub allows you to load the model, but you need to do model surgery for specifying number of classes, etc.

I have an example for DeTR.

https://github.com/oke-aditya/quickvision/blob/master/quickvision/models/detection/detr/model_factory.py

We can load the detr backbone, but need to adjust the head classifier for own number of classes.

Similarly for CNNs, One need to load the backbone, modify the head classifier for custom num_classes. You need to freeze / unfreeze layers while transfer learning and fine-tuning.

We can think about this little bit more, this is something Flash does well I think.

cc @Borda @kaushikb11 @Programmer-RD-AI @akihironitta

oke-aditya avatar Oct 25 '21 19:10 oke-aditya

ok, thank you @oke-aditya I will try to fix the issue.

Programmer-RD-AI avatar Oct 26 '21 03:10 Programmer-RD-AI

Since a single PR will not be a solution. I would suggest to propose a brief prototype (probably a branch here or new repo) and let maintainers have a look. Also would suggest to check over slack / with Borda if this is part of PL plans moving ahead with bolts.

oke-aditya avatar Oct 26 '21 04:10 oke-aditya

ok thank you @oke-aditya

Programmer-RD-AI avatar Oct 26 '21 05:10 Programmer-RD-AI

hi, I am currently building a demo of this and my question is I can.

from torchvision.models import googlenet

model = googlenet().to(device)
print(model) # Prints the model architecture
model.fc = Linear(1000,len(classes))

then use the model as usual. I am just a bit confused that's why. Thank you.

Programmer-RD-AI avatar Oct 26 '21 07:10 Programmer-RD-AI

Yes, you can and this is correct way, But note that fc layer is applicable for GoogleNet and Resnet, for models like mobilenet it is called classiifer or something else (please check). For CNNs it is simple to just modify the last layer to support more number of classes.

oke-aditya avatar Oct 26 '21 08:10 oke-aditya

hi, I usually add __init__ self.output = Linear(1000,len(classes)) forward

preds = self.tl_model(X)
preds = self.output(preds)

I don't know if this is the best way but when I am testing TL Models I use this

Programmer-RD-AI avatar Oct 26 '21 08:10 Programmer-RD-AI

Hi ! I think You are adding an additional Linear layer on top of fully connected layer. This is not the best way to transfer learning, it would work fine in practice as you get an extra fully connected layer. Which means a addition of 1000 * 1000 parameters, (as your previous fc layer Linear(x, 1000) and you have (1000, num_classes) now

Best way is to edit the existing layer and replace it with Linear(1000, num_classes). This does not increase the number of parameters drastically.

Thanks for asking

oke-aditya avatar Oct 26 '21 08:10 oke-aditya

hi, Sorry for asking this many questions but I am confused that's why.

For freezing layers model = googlenet() model.some_fc.requires_grad = False

and for fine-tuning

model = goognet() from model.some_fc = Linear(512,985) to model.some_fc = Sequential([Linear(512,1024),Linear(1024,985)])

So what are the feature I need to create?

I am sorry for asking this many questions.

Thank you.

Programmer-RD-AI avatar Oct 26 '21 09:10 Programmer-RD-AI

Ok so let me elaborate a bit more.

Let me explain the transfer learning scenarios. These examples are written for CNNs, but kind-of generalize over other models too. Note that when we are doing transfer learning, it means we are using the pre-trained weights. Hence pretrained=True for all cases.

First two scenarios are clearly well described in Transfer learning Tutorial. (Great one by @chsasank. One of the best in this field!

  1. Simply re-training the model with pretrained=True.

This is most simple approach, we aren't freezing the backbone. Refer here in the tutorial

model = resnet50(pretrained=True)
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes)

Simply train the model. We train each and every parameter, with just the difference being that we have num_classes instead of 1000. Naive approach, works fine, can give you decent results. It will take lot of time though (You are training a model anyway! and have lot of paremeters to train)

  1. Training only the head feature extractor.

Refer here in the tutorial

This is what you tried above. Here we are interested in only training the classification head of the network. We freeze the backbone of the model.

model = resnet50(pretrained=True)

# Freeze all the parameters.

for param in model.parameters():
     param.requires_grad = False

# Unfreeze the head.
# This simply replaces the head with num_classes
in_features = model.fc.in_features
model.fc = nn.Linear(in_features, num_classes)

# You may prefer to add an extra fully connected layer, but that isn't needed in most cases.
# Left to you, many don't prefer, as it can cause large increase in parameters.
# This would work well if you have BERT / millions of params in the backbone and adding a few hundreds of params in the 
# head of the model won't make big difference. 
# Basically no of params with pre-trained weights >>> number of fully connected params. 

# Adding extra fc to head

in_features = model.fc.in_features
model.fc = nn.Sequential(
      [
                nn.Linear(in_features, hidden_params),
               # Many prefer dropout in between to avoid over-fitting
                # nn.Dropout(0.2)
                nn.Linear(hidden_params, num_classes)
      ]
)

  1. Unfreezing layers / blocks. one by one.

This is where Fine-tuning comes into play, we really want to make most of every block of network.

You can first freeze the backbone and train the head with Strategy 2

This can be trained for a few epochs. with a decent learning rate of 1e-3

Here is the second training training routine.

Now you want to freeze each block / specific blocks, say 5 of last Conv layers (Or residual block) in ResNet. You would unfreeze only them. Continue training them with a slightly lower lr of 1e-4, and for much longer epochs.

You may unfreeze more blocks / probably stop here. It is very much left to you. Note that after unfreezing a block you train them progressively. (You don't freeze the Linear layers when you unfreeze the conv blocks)

I don't know if there is any other way of transfer learning, (I haven't seen any other approach), These work well in practice.

P.S.

First of all, my appreciation to you! You are very young developer (I guess 14), and I'm super excited that you know so much stuff at such a tender age! At your age I was probably more interested in knowing how to install anti-virus and knew nothing about coding. (forget GitHub account, I didn't even know the word GitHub) You have Have a great and bright future ahead! Wish you success.

oke-aditya avatar Oct 26 '21 19:10 oke-aditya

OK thank you I can understand the issue now. I will start working on it.

Thank you very much @oke-aditya

Programmer-RD-AI avatar Oct 27 '21 02:10 Programmer-RD-AI

hi,

Again I am really sorry for asking this many questions but I am not understanding this correctly.

So what I need to implement is

  1. Simply re-training the model with pretrained=True
  2. Training only the head feature extractor.
  3. Unfreezing layers/blocks. one by one.

I need to implement the above features in lightning bolts in an easier way.

Is my understanding correct? I am so sorry for asking this many questions.

If not what are the specific thing I need to work on or implement.

With best regards, Ranuga

Programmer-RD-AI avatar Oct 27 '21 04:10 Programmer-RD-AI