deeplearning-cv-notes keras 使用预训练模型迁移学习、fine-tune

keras 使用预训练模型迁移学习、fine-tune

Open jayboxyz opened this issue 5 years ago • 1 comments

From：http://www.pipiwa.top/blog/show/1199

1、自动下载预训练模型及权重

base_model = VGG16(weights='imagenet', include_top=False, pooling=None,
                   input_shape=(resize, resize, 3), classes = 2)
for layer in base_model.layers:
    layer.trainable = False
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(64, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(2, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=predictions)
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])
his = model.fit(train_data,train_label,
          batch_size=64,
          epochs=50,
          validation_split=0.2,
          shuffle=True)

输出提示显示，程序会自动去网上下载VGG16的权重模型。

2、自己手动下载

Keras-模型代码： https://github.com/fchollet/deep-learning-models

原始权重下载地址：https://github.com/fchollet/deep-learning-models/releases

tf模型权重百度云下载地址：http://pan.baidu.com/s/1dE9giOD

vgg16的权重文件有四个：

vgg16_weights_tf_dim_ordering_tf_kernels.h5 和vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
vgg16_weights_th_dim_ordering_th_kernels.h5 和vgg16_weights_th_dim_ordering_th_kernels_notop.h5

notop理解：notop表示不包含顶层的全连接层，所以两个版本，一个是包含全连接层权重的版本，一个是不包含全连接层的版本

th与tf的区别：Keras提供了两套后端，Theano和Tensorflow， th和tf的大部分功能都被backend统一包装起来了，但二者还是存在不小的冲突，有时候你需要特别注意Keras是运行在哪种后端之上，它们的主要冲突有：

dim_ordering，也就是维度顺序。比方说一张224*224的彩色图片，theano的维度顺序是(3，224，224)，即通道维在前。而tf的维度顺序是(224，224，3)，即通道维在后。
数据格式的区别，channels_last”对应原本的“tf”，“channels_first”对应原本的“th”。tf 默认的格式是（rows,cols,channels）,th默认的格式是（channels,rows,cols）以128x128的RGB图像为例，“channels_first”应将数据组织为（3,128,128），而“channels_last”应将数据组织为（128,128,3）

以VGG为例，我们后天使用的是tensorflow，我们要自己训练全连接层，下载vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5 ，代码如下：

base_model = VGG16(weights='./vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5', include_top=False, pooling=None,
                   input_shape=(resize, resize, 3), classes = 2)

Dec 02 '19 07:12 jayboxyz

参考：

在新类别上fine-tune inceptionV3：

from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K

# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)

# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(200, activation='softmax')(x)

# this is the model we will train
model = Model(inputs=base_model.input, outputs=predictions)

# first: train only the top layers (which were randomly initialized) 只训练顶层(随机初始化)
# i.e. freeze all convolutional InceptionV3 layers 例如，冻结所有卷积InceptionV3层
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable) 
# 编译模型(应该在将层设置为不可训练后完成)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

# train the model on the new data for a few epochs
# 在新数据上训练模型几个时期
model.fit_generator(...)

# at this point, the top layers are well trained and we can start fine-tuning 
# 至此，顶层已经训练有素，我们可以开始进行微调
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers. 
# 从inception V3开始的卷层。我们将冻结底层N层和培训其余的顶层。

# let's visualize layer names and layer indices to see how many layers 
# 让我们可视化层名称和层索引，看看有多少层
# we should freeze: 我们应该冻结：
for i, layer in enumerate(base_model.layers):
   print(i, layer.name)

# we chose to train the top 2 inception blocks, i.e. we will freeze 
# 我们选择训练前2个先启区，也就是我们会冻结
# the first 249 layers and unfreeze the rest:
for layer in model.layers[:249]: 首先是249层，然后解冻剩下的部分
   layer.trainable = False
for layer in model.layers[249:]:
   layer.trainable = True

# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy')

# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(...)

在定制的输入tensor上构建InceptionV3：

from keras.applications.inception_v3 import InceptionV3
from keras.layers import Input

# this could also be the output a different Keras model or layer
input_tensor = Input(shape=(224, 224, 3))  # this assumes K.image_data_format() == 'channels_last'

model = InceptionV3(input_tensor=input_tensor, weights='imagenet', include_top=True)

深度学习可以说是一门数据驱动的学科，各种有名的CNN模型，无一不是在大型的数据库上进行的训练。像ImageNet这种规模的数据库，动辄上百万张图片。对于普通的机器学习工作者、学习者来说，面对的任务各不相同，很难拿到如此大规模的数据集。同时也没有谷歌，Facebook那种大公司惊人的算力支持，想从0训练一个深度CNN网络，基本是不可能的。但是好在已经训练好的模型的参数，往往经过简单的调整和训练，就可以很好的迁移到其他不同的数据集上，同时也无需大量的算力支撑，便能在短时间内训练得出满意的效果。这便是迁移学习。究其根本，就是虽然图像的数据集不同，但是底层的特征却是有大部分通用的。

迁移学习主要分为两种

第一种即所谓的transfer learning，迁移训练时，移掉最顶层，比如ImageNet训练任务的顶层就是一个1000输出的全连接层，换上新的顶层，比如输出为10的全连接层，然后训练的时候，只训练最后两层，即原网络的倒数第二层和新换的全连接输出层。可以说transfer learning将底层的网络当做了一个特征提取器来使用。
第二种叫做fine tune，和transfer learning一样，换一个新的顶层，但是这一次在训练的过程中，所有的（或大部分）其它层都会经过训练。也就是底层的权重也会随着训练进行调整。

一个典型的迁移学习过程是这样的。首先通过transfer learning对新的数据集进行训练，训练过一定epoch之后，改用fine tune方法继续训练，同时降低学习率。这样做是因为如果一开始就采用fine tune方法的话，网络还没有适应新的数据，那么在进行参数更新的时候，比较大的梯度可能会导致原本训练的比较好的参数被污染，反而导致效果下降。 ———————————————— 版权声明：本文为CSDN博主「史丹利复合田」的原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接及本声明。原文链接：https://blog.csdn.net/tsyccnh/article/details/78889838

Dec 02 '19 07:12 jayboxyz

deeplearning-cv-notes deeplearning-cv-notes copied to clipboard

keras 使用预训练模型迁移学习、fine-tune

deeplearning-cv-notes
deeplearning-cv-notes copied to clipboard