ML-KWS-for-MCU
ML-KWS-for-MCU copied to clipboard
DSCNN accuracy reported by test.py
Hi, If I am not mistaken, test.py, tests the data-set on the checkpoint, and reports the results by forming confusion matrix. when I use test.py for DNN with my Data-set which is just an older version of google dataset I see better test accuracy for DNN comparing to DSCNN, ~80% and ~60% respectively, which is not compatible with accuracies reported in the article, it is expected that DSCNN perform better. I use the same small model size info and other parameters provided here both for train.py and test.py. Do you see the same accuracies? @navsuda
I did some more experiments as I got last step validation accuracy reported by train.py is different from the accuracy reported by test.py using last available checkpoint. I also tried all other available checkpoints, sometimes I got better results but not the same highest score shown in tensorboard or training reports. It seems that train.py for DSCNN does not save the best checkpoint, so that the actual best checkpoint is missing at the end. Is it possible to make train.py save more checkpoints between steps, or any chance to review the code to get the actual best checkpoint? @navsuda last step Training report:
INFO:tensorflow:Step 30000: Validation accuracy = 92.63% (N=3093) INFO:tensorflow:So far the best validation accuracy is 92.82%
test.py with last generated check point:
INFO:tensorflow:Final test accuracy = 93.12% (N=3081) INFO:tensorflow:Validation accuracy = 70.45% (N=3093) INFO:tensorflow:Test accuracy = 69.46% (N=3081)
test.py with another available checkpoint in BEST folder
INFO:tensorflow:Training accuracy = 74.14% (N=22246) INFO:tensorflow:Validation accuracy = 71.68% (N=3093) INFO:tensorflow:Test accuracy = 70.92% (N=3081)
Another reason might be due adding test category, but even by assigning test data percentage to zero I don't get to the highest validation accuracy reported by train.py
I changed the train code to get other checkpoints, but still could not see the same validation accuracy both in tensorboard (or train.py output) and test.py for the same checkpoint. It seems this issue exists only for dscnn and not for dnn.
# Save the model checkpoint periodically.
if (training_step % FLAGS.save_step_interval == 0 or
train_step == training_steps_max):
checkpoint_path = os.path.join(FLAGS.train_dir, 'all',
FLAGS.model_architecture + '.ckpt')
tf.logging.info('Saving to "%s-%d"', checkpoint_path, training_step)
saver.save(sess, checkpoint_path, global_step=training_step)
# Save the model checkpoint when validation accuracy improves
if total_accuracy > best_accuracy:
best_accuracy = total_accuracy
checkpoint_path = os.path.join(FLAGS.train_dir, 'best',
FLAGS.model_architecture + '_'+ str(int(best_accuracy*10000)) + '.ckpt')
tf.logging.info('Saving best model to "%s-%d"', checkpoint_path, training_step)
saver.save(sess, checkpoint_path, global_step=training_step)
tf.logging.info('So far the best validation accuracy is %.2f%%' % (best_accuracy*100))
@pooyaww,
The only possible explanation for this discrepancy I can think of is that you have used different parameters for --window_size_ms
and --window_stride_ms
during training vs. testing. Please make sure you use the same parameters for train.py and test.py.
@navsuda @pooyaww I meet the same problem in ds_cnn,which the quant_test accuracy is lower than the same situation in dnn, and i checked the code, all parameters for --window_size_ms and --window_stride_ms during training vs. testing are same, I wonder if you know what's wrong? The confusion Matrix as follow: [[ 0 0 0 0 0 0 0 0 0 0 371 0] [ 0 194 0 0 0 47 0 5 17 0 108 0] [ 0 131 30 0 0 128 2 0 11 0 95 0] [ 0 146 0 0 0 152 0 5 17 0 85 1] [ 0 68 0 0 1 19 0 1 30 0 231 0] [ 0 79 0 0 0 176 0 4 13 0 105 0] [ 0 163 1 0 0 58 13 10 12 0 95 0] [ 0 156 0 0 0 31 0 137 3 0 36 0] [ 0 103 0 0 0 33 0 0 141 0 86 0] [ 0 59 0 0 0 12 0 1 105 0 196 0] [ 0 26 0 0 0 11 0 0 5 0 308 0] [ 0 129 0 0 0 73 0 3 20 0 146 1]] and I want to know,what cause this condition?
@xingdonw, One thing to check is: did the accuracy degrade after fusing the batch-norm layers to the preceding convolution layers as mentioned in the guide?
@navsuda
A issue I found in source code of quant_models.py:
According the guide, I was trying to quantize ds-cnn model by command:
python quant_test.py
# we need a else branch to handle case when act_max[2*layer_no] == 0
# batch-norm weights folded into depthwise conv
# bn = slim.batch_norm(depthwise_conv, scope=sc+'/dw_conv/batch_norm')
B
if(act_max[2layer_no+1]>0): pointwise_conv = tf.fake_quant_with_min_max_vars(pointwise_conv, min=-act_max[2layer_no+1], max=act_max[2layer_no+1]-(act_max[2layer_no+1]/128.0), num_bits=8, name='quant_pw_conv'+str(layer_no+1)) bn = tf.nn.relu(pointwise_conv)
we need a else branch to handle case when act_max[2*layer_no+1] == 0
C
if act_max[1]>0:
net = tf.fake_quant_with_min_max_vars(net, min=-act_max[1],
max=act_max[1]-(act_max[1]/128.0), num_bits=8, name='quant_conv1')
net = tf.nn.relu(net)
# we need a else branch to handle case when act_max[1] == 0
#net = slim.batch_norm(net, scope='conv_1/batch_norm')
can you provide a patch for this issue? thanks
@navsuda
another issue is, when I run python quant_test.py --act_max 64 0 0 0 0 0 0 0 0 0 0 0 .... for ds-cnn model,
I can get a training accuracy which is very close to the training accuracy output by test.py.
but when I try to quantizing the second parameter by command:
python quant_test.py --act_max 64 x 0 0 0 0 0 0 0 0 0 0
I tried replace x with 128,64,32,16,8,4,2,1
all output training accuracy is much lower than when x==0
could you please give some idea for this case? thanks