ML-KWS-for-MCU icon indicating copy to clipboard operation
ML-KWS-for-MCU copied to clipboard

DSCNN accuracy reported by test.py

Open pooyaww opened this issue 6 years ago • 7 comments

Hi, If I am not mistaken, test.py, tests the data-set on the checkpoint, and reports the results by forming confusion matrix. when I use test.py for DNN with my Data-set which is just an older version of google dataset I see better test accuracy for DNN comparing to DSCNN, ~80% and ~60% respectively, which is not compatible with accuracies reported in the article, it is expected that DSCNN perform better. I use the same small model size info and other parameters provided here both for train.py and test.py. Do you see the same accuracies? @navsuda

pooyaww avatar Jun 25 '18 18:06 pooyaww

I did some more experiments as I got last step validation accuracy reported by train.py is different from the accuracy reported by test.py using last available checkpoint. I also tried all other available checkpoints, sometimes I got better results but not the same highest score shown in tensorboard or training reports. It seems that train.py for DSCNN does not save the best checkpoint, so that the actual best checkpoint is missing at the end. Is it possible to make train.py save more checkpoints between steps, or any chance to review the code to get the actual best checkpoint? @navsuda last step Training report:

INFO:tensorflow:Step 30000: Validation accuracy = 92.63% (N=3093) INFO:tensorflow:So far the best validation accuracy is 92.82%

test.py with last generated check point:

INFO:tensorflow:Final test accuracy = 93.12% (N=3081) INFO:tensorflow:Validation accuracy = 70.45% (N=3093) INFO:tensorflow:Test accuracy = 69.46% (N=3081)

test.py with another available checkpoint in BEST folder

INFO:tensorflow:Training accuracy = 74.14% (N=22246) INFO:tensorflow:Validation accuracy = 71.68% (N=3093) INFO:tensorflow:Test accuracy = 70.92% (N=3081)

Another reason might be due adding test category, but even by assigning test data percentage to zero I don't get to the highest validation accuracy reported by train.py

pooyaww avatar Jun 26 '18 10:06 pooyaww

I changed the train code to get other checkpoints, but still could not see the same validation accuracy both in tensorboard (or train.py output) and test.py for the same checkpoint. It seems this issue exists only for dscnn and not for dnn.

      # Save the model checkpoint periodically.
      if (training_step % FLAGS.save_step_interval == 0 or
          train_step == training_steps_max):
        checkpoint_path = os.path.join(FLAGS.train_dir, 'all',
                                       FLAGS.model_architecture + '.ckpt')
        tf.logging.info('Saving to "%s-%d"', checkpoint_path, training_step)
        saver.save(sess, checkpoint_path, global_step=training_step)

        # Save the model checkpoint when validation accuracy improves
        if total_accuracy > best_accuracy:
            best_accuracy = total_accuracy
            checkpoint_path = os.path.join(FLAGS.train_dir, 'best',
                                        FLAGS.model_architecture + '_'+ str(int(best_accuracy*10000)) + '.ckpt')
            tf.logging.info('Saving best model to "%s-%d"', checkpoint_path, training_step)
            saver.save(sess, checkpoint_path, global_step=training_step)
            
      tf.logging.info('So far the best validation accuracy is %.2f%%' % (best_accuracy*100))

pooyaww avatar Jun 26 '18 14:06 pooyaww

@pooyaww, The only possible explanation for this discrepancy I can think of is that you have used different parameters for --window_size_ms and --window_stride_ms during training vs. testing. Please make sure you use the same parameters for train.py and test.py.

navsuda avatar Jul 11 '18 06:07 navsuda

@navsuda @pooyaww I meet the same problem in ds_cnn,which the quant_test accuracy is lower than the same situation in dnn, and i checked the code, all parameters for --window_size_ms and --window_stride_ms during training vs. testing are same, I wonder if you know what's wrong? The confusion Matrix as follow: [[ 0 0 0 0 0 0 0 0 0 0 371 0] [ 0 194 0 0 0 47 0 5 17 0 108 0] [ 0 131 30 0 0 128 2 0 11 0 95 0] [ 0 146 0 0 0 152 0 5 17 0 85 1] [ 0 68 0 0 1 19 0 1 30 0 231 0] [ 0 79 0 0 0 176 0 4 13 0 105 0] [ 0 163 1 0 0 58 13 10 12 0 95 0] [ 0 156 0 0 0 31 0 137 3 0 36 0] [ 0 103 0 0 0 33 0 0 141 0 86 0] [ 0 59 0 0 0 12 0 1 105 0 196 0] [ 0 26 0 0 0 11 0 0 5 0 308 0] [ 0 129 0 0 0 73 0 3 20 0 146 1]] and I want to know,what cause this condition?

xingdonw avatar Jul 27 '18 09:07 xingdonw

@xingdonw, One thing to check is: did the accuracy degrade after fusing the batch-norm layers to the preceding convolution layers as mentioned in the guide?

navsuda avatar Aug 28 '18 03:08 navsuda

@navsuda A issue I found in source code of quant_models.py: According the guide, I was trying to quantize ds-cnn model by command: python quant_test.py --act_max 0 0 0 0 0 0 0 0 0 0 0 0 which should output a accuracy equal to "test.py ....." because we use floating point for all activation parameters, but it is not the case and I found the reason. In file quant_models.py A if(act_max[2layer_no]>0): depthwise_conv = tf.fake_quant_with_min_max_vars(depthwise_conv, min=-act_max[2layer_no], max=act_max[2layer_no]-(act_max[2layer_no]/128.0), num_bits=8, name='quant_ds_conv'+str(layer_no)) bn = tf.nn.relu(depthwise_conv)

# we need a else branch to handle case when act_max[2*layer_no] == 0

# batch-norm weights folded into depthwise conv 
# bn = slim.batch_norm(depthwise_conv, scope=sc+'/dw_conv/batch_norm')

B

if(act_max[2layer_no+1]>0): pointwise_conv = tf.fake_quant_with_min_max_vars(pointwise_conv, min=-act_max[2layer_no+1], max=act_max[2layer_no+1]-(act_max[2layer_no+1]/128.0), num_bits=8, name='quant_pw_conv'+str(layer_no+1)) bn = tf.nn.relu(pointwise_conv)

we need a else branch to handle case when act_max[2*layer_no+1] == 0

C

       if act_max[1]>0:
          net = tf.fake_quant_with_min_max_vars(net, min=-act_max[1], 
              max=act_max[1]-(act_max[1]/128.0), num_bits=8, name='quant_conv1')
        net = tf.nn.relu(net)
         # we need a else branch to handle case when act_max[1] == 0
        #net = slim.batch_norm(net, scope='conv_1/batch_norm')

can you provide a patch for this issue? thanks

leiming0225 avatar Nov 07 '18 10:11 leiming0225

@navsuda another issue is, when I run python quant_test.py --act_max 64 0 0 0 0 0 0 0 0 0 0 0 .... for ds-cnn model, I can get a training accuracy which is very close to the training accuracy output by test.py. but when I try to quantizing the second parameter by command: python quant_test.py --act_max 64 x 0 0 0 0 0 0 0 0 0 0
I tried replace x with 128,64,32,16,8,4,2,1 all output training accuracy is much lower than when x==0 could you please give some idea for this case? thanks

leiming0225 avatar Nov 07 '18 10:11 leiming0225