InsightFace-tensorflow 使用训练好的模型在faces_ms1m_112x112.tfrecord数据上验证时准确率很低？

您好，非常感谢您的代码，很好用~ 有一个小小的疑惑，很想让您解答一下，就是我在faces_ms1m_112x112.tfrecord数据集上训练好后，想要在这个数据集上再验证一下分类准确率，但是我发现其准确率一直是0，但在lfw、agedb_30那种.bin文件上验证准确率很正常，请问这是为何呢？谢谢~

Jul 26 '19 13:07 xuweihuawei

@xuweihuawei 训练代码里面每个batch都会打印出来训练损失和训练分类准确率，这里打印的训练准确率也是0吗？

Jul 29 '19 02:07 luckycallor

您好，训练代码中每个batch打印出来的分类准确率不是0，在训练了10几个epoch之后几乎都是0.90以上。可能是我描述的不太仔细，我再描述下这个问题：就是我按照您的代码训练好后，保存了.ckpt模型，然后我再载入这个.ckpt模型，读入faces_ms1m_112x112.tfrecord数据，想要验证下在这个数据集上的准确率，大概的代码如下： train_phase_dropout = tf.placeholder(dtype=tf.bool, shape=None, name='train_phase') train_phase_bn = tf.placeholder(dtype=tf.bool, shape=None, name='train_phase_last')

    cid = ClassificationImageData(img_size=config['image_size'], augment_flag=config['augment_flag'], augment_margin=config['augment_margin'])
    train_dataset = cid.read_TFRecord(config['train_data']).shuffle(10000).repeat().batch(config['batch_size'])
    train_iterator = train_dataset.make_one_shot_iterator()
    train_images, train_labels = train_iterator.get_next()
    train_images = tf.identity(train_images, 'input_images')
    train_labels = tf.identity(train_labels, 'labels')
    
    embds, logits, end_points = inference(train_images, train_labels, train_phase_dropout, train_phase_bn, config)
    pred = tf.arg_max(tf.nn.softmax(logits), dimension=-1, output_type=tf.int64)
    train_acc = tf.reduce_mean(tf.cast(tf.equal(pred, train_labels), tf.float32))
            
    #embds, _ = get_embd(images, train_phase_dropout, train_phase_bn, config)
    print('done!')
    tf_config = tf.ConfigProto(allow_soft_placement=True)
    tf_config.gpu_options.allow_growth = True
    with tf.Session(config=tf_config) as sess:
        tf.global_variables_initializer().run()
        print('loading...')
        saver = tf.train.Saver()
        saver.restore(sess, args.model_path)
        print('done!')

        batch_size = config['batch_size']
        # batch_size = 32
        print('evaluating...')
        
        acc, p, l = sess.run([train_acc, pred, train_labels], feed_dict={train_phase_dropout: False, train_phase_bn: False})
        print('acc is:', acc)
        print('pred is:', p)
        print('label is:', l)

我发现几乎所有的pred和label都不一样，acc一直为0。后来我尝试将config_ms1m_100.yaml中loss_type: arcface 改为softmax，训练好后这样验证acc比较正常，在0.9以上，我怀疑是否是arcface这种loss的计算方式导致的这样的结果呢？

另外我发现mxnet中写的原版arcface也没有这样验证，都是生成.bin数据1对1比较验证，为何不用这种验证方式呢？

还有个类似问题，就是我原本训练了10个epoch，此时保存了训练好的.ckpt模型，然后再载入这个.ckpt模型后接着训练，训练的acc一开始几个batch step一直为低，几乎为0，接着很快就恢复成保存前的高acc，这是为啥呢？

Jul 29 '19 11:07 xuweihuawei

@xuweihuawei 您好，我想问下你是哪个公开数据集上训练的，能够公布下你的配置文件吗？

Aug 05 '19 08:08 wuleibupt

我下载了你公布出的模型，没有再次训练，发现准确率也很低。不知道什么原因。我必须自己在训练吗？感觉理论上不用啊QAQ

Aug 16 '19 02:08 liutt1993

InsightFace-tensorflow InsightFace-tensorflow copied to clipboard

使用训练好的模型在faces_ms1m_112x112.tfrecord数据上验证时准确率很低？

InsightFace-tensorflow
InsightFace-tensorflow copied to clipboard