nni icon indicating copy to clipboard operation
nni copied to clipboard

How to use cream.CreamSupernetTrainer() correctly?

Open yuezhuang1387 opened this issue 3 years ago • 3 comments

Describe the issue:

When I use CreamSupernetTrainer(), it will report an error:NotImplementedError. I find it seems that it is because cream does not implement the corresponding CreamMutator. Is there any solution? image

Environment:

  • NNI version: 2.3
  • Training service (local|remote|pai|aml|etc):
  • Client OS: win10
  • Python version: 3.8
  • PyTorch/TensorFlow version: PyTorch1.8.1
  • Is conda/virtualenv/venv used?: conda
  • Is running in Docker?:

Code: import torch from torch import nn from nni.nas.pytorch import mutables from torchvision import transforms import torchvision from collections import OrderedDict import os import ops from nni.nas.pytorch.mutator import Mutator from nni.algorithms.nas.pytorch import cream

class AlexNet(nn.Module): def init(self): super(AlexNet, self).init() self.con1 = nn.Sequential(nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), ) self.con2 = mutables.LayerChoice(OrderedDict([ ('55', nn.Conv2d(96, 256, kernel_size=5, padding=2)), ('33', nn.Conv2d(96, 256, kernel_size=3, padding=1)), ('33dilsep', ops.DilConv(96, 256, 3, 1, 2, 2)), ('33sep', ops.SepConv(96, 256, 3, 1, 1))]), key='con2layer_key') self.con3 = nn.Sequential(nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU()) self.con4 = mutables.LayerChoice(OrderedDict([ ('33', nn.Conv2d(384, 384, kernel_size=3, padding=1)), ('33dilsep', ops.DilConv(384, 384, 3, 1, 2, 2)), ('3*3sep', ops.SepConv(384, 384, 3, 1, 1))]), key='con4layer_key') self.con5 = nn.Sequential(nn.ReLU(), nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(), nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5), nn.Linear(4096, 10)) def forward(self, x): x = self.con1(x) x = self.con2(x) x = self.con3(x) x = self.con4(x) x = self.con5(x) return x

def test(): model = AlexNet() model.train() trans = [transforms.ToTensor()] resize = 224 if resize: trans.insert(0, transforms.Resize(resize)) trans = transforms.Compose(trans) dataset_train = torchvision.datasets.FashionMNIST( root='E:\liefeng\Pytest\data', train=True, transform=trans, download=True) dataset_test = torchvision.datasets.FashionMNIST( root='E:\liefeng\Pytest\data', train=False, transform=trans, download=True)

criterion = nn.CrossEntropyLoss()
criterion_val = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)

train_loader = torch.utils.data.DataLoader(dataset_train,
                                           batch_size=64,
                                           shuffle = True,
                                           num_workers=4,
                                           pin_memory=True)
valid_loader = torch.utils.data.DataLoader(dataset_test,
                                           batch_size=64,
                                           shuffle = False,
                                           num_workers=4,
                                           pin_memory=True)

trainer_cream = cream.CreamSupernetTrainer(model,
                                           loss=criterion,
                                     val_loss=criterion_val,
                                     optimizer=optimizer,
                                     num_epochs=10,
                                     train_loader=train_loader,
                                     valid_loader=valid_loader,
                                     mutator=Mutator(model),
                                     batch_size=64,
                                     log_frequency=40,
                                     meta_sta_epoch=5,
                                     update_iter=200,
                                     slices=2,
                                     pool_size=10,
                                     pick_method='meta',
                                     choice_num=6,
                                     sta_num=(4,4,4,4,4),
                                     acc_gap=5,
                                     flops_dict=None,
                                     flops_fixed=0,
                                     local_rank=0,
                                     callbacks=None)

trainer_cream.enable_visualization()
trainer_cream.train()  # training
if os.path.isdir('model_dir'):
    pass
else:
    os.makedirs('model_dir')
    print('craete model_dir!')
trainer_cream.export(file="model_dir/final_AlexNetNAS_cream.json")

if name == 'main': test()

Log message: [2021-08-31 23:55:35] INFO (nni.nas.pytorch.trainer/MainThread) Creating graph json, writing to logs\1630425332.8086371. Visualization enabled. [2021-08-31 23:55:35] WARNING (nni.nas.pytorch.mutator/MainThread) Graph is only tested with PyTorch 1.4. Other versions might not work. [2021-08-31 23:55:37] INFO (nni.nas.pytorch.trainer/MainThread) Epoch 1 Training Traceback (most recent call last): File "E:/Pytest/NAS/AlexNetNAS-cream-nni.py", line 113, in test() File "E:/Pytest/NAS/AlexNetNAS-cream-nni.py", line 102, in test trainer_cream.train() # training File "D:\application\anaconda\anaconda3\envs\YUE_PYTHON\lib\site-packages\nni\nas\pytorch\trainer.py", line 154, in train loss = self.train_one_epoch(epoch) File "D:\application\anaconda\anaconda3\envs\YUE_PYTHON\lib\site-packages\nni\algorithms\nas\pytorch\cream\trainer.py", line 356, in train_one_epoch self.mutator.reset() File "D:\application\anaconda\anaconda3\envs\YUE_PYTHON\lib\site-packages\nni\nas\pytorch\mutator.py", line 52, in reset self._cache = self.sample_search() File "D:\application\anaconda\anaconda3\envs\YUE_PYTHON\lib\site-packages\nni\nas\pytorch\mutator.py", line 33, in sample_search raise NotImplementedError NotImplementedError

yuezhuang1387 avatar Aug 31 '21 16:08 yuezhuang1387

Adding @penghouwen (Cream's author) in case @jonsnows and @yuezhuang1387 need further help here.

scarlett2018 avatar Sep 26 '21 08:09 scarlett2018

Hi,

Thanks for your interest in Cream!

You could refer to this line for the correct usage of mutator.

mutator = RandomMutator(model) # instead of mutator=Mutator(model),

Best,

Hao.

Z7zuqer avatar Sep 28 '21 01:09 Z7zuqer

hi @yuezhuang1387 Do you still have this problem? Hao had reply the issue! And nni v2.9 had been released. Welcome to use the latest version to test this issue. I really hope your problem has been resolved.

Lijiaoa avatar Sep 13 '22 03:09 Lijiaoa

Closed because no reply for a long time.

Lijiaoa avatar Dec 21 '22 02:12 Lijiaoa