cleverhans
cleverhans copied to clipboard
Predicted Class Labels
Feature to report/record predicted class labels when using do_eval or model_eval. The overall accuracy is reported, however, characterizing model performance on specific test data would enable insight for a problem space.
Just would like to ping on this. Is there a command that would provide predicted class labels?
def model_eval(sess, x, y, predictions, X_test=None, Y_test=None,
feed=None, args=None):
"""
Compute the accuracy of a TF model on some data
:param sess: TF session to use
:param x: input placeholder
:param y: output placeholder (for labels)
:param predictions: model output predictions
:param X_test: numpy array with training inputs
:param Y_test: numpy array with training outputs
:param feed: An optional dictionary that is appended to the feeding
dictionary before the session runs. Can be used to feed
the learning phase of a Keras model for instance.
:param args: dict or argparse Namespace
object.
Should contain batch_size
:return: a float with the accuracy value
"""
global _model_eval_cache
args = _ArgsWrapper(args or {})
assert args.batch_size, "Batch size was not given in args dict" if X_test is None or Y_test is None: raise ValueError("X_test argument and Y_test argument " "must be supplied.")
Define accuracy symbolically
key = (y, predictions) if key in _model_eval_cache: correct_preds = _model_eval_cache[key] else: correct_preds = tf.equal(tf.argmax(y, axis=-1), tf.argmax(predictions, axis=-1)) _model_eval_cache[key] = correct_preds
Init result var
accuracy = 0.0
with sess.as_default(): # Compute number of batches nb_batches = int(math.ceil(float(len(X_test)) / args.batch_size)) assert nb_batches * args.batch_size >= len(X_test)
X_cur = np.zeros((args.batch_size,) + X_test.shape[1:],
dtype=X_test.dtype)
Y_cur = np.zeros((args.batch_size,) + Y_test.shape[1:],
dtype=Y_test.dtype)
for batch in range(nb_batches):
if batch % 100 == 0 and batch > 0:
_logger.debug("Batch " + str(batch))
# Must not use the `batch_indices` function here, because it
# repeats some examples.
# It's acceptable to repeat during training, but not eval.
start = batch * args.batch_size
end = min(len(X_test), start + args.batch_size)
# The last batch may be smaller than all others. This should not
# affect the accuarcy disproportionately.
cur_batch_size = end - start
X_cur[:cur_batch_size] = X_test[start:end]
Y_cur[:cur_batch_size] = Y_test[start:end]
feed_dict = {x: X_cur, y: Y_cur}
if feed is not None:
feed_dict.update(feed)
cur_corr_preds = correct_preds.eval(feed_dict=feed_dict)
accuracy += cur_corr_preds[:cur_batch_size].sum()
assert end >= len(X_test)
# Divide by number of examples to get final value
accuracy /= len(X_test)
return accuracy
_model_eval_cache = {}