triplet-reid Embed a single image

Hi, I'm trying to write a script to embed a single image based on your code, it's look something like this:

import json
import os
from importlib import import_module

import cv2
import tensorflow as tf
import numpy as np

sess = tf.Session()

# Read config
config = json.loads(open(os.path.join(
    '<exp_root>', 'args.json'), 'r').read())

# Input img
net_input_size = (
    config['net_input_height'], config['net_input_width'])
img = tf.placeholder(tf.float32, (None, net_input_size[0], net_input_size[1], 3))

# Create the model and an embedding head.
model = import_module('nets.' + config['model_name'])
head = import_module('heads.' + config['head_name'])

endpoints, _ = model.endpoints(img, is_training=False)
with tf.name_scope('head'):
    endpoints = head.head(endpoints, config['embedding_dim'], is_training=False)

# Initialize the network/load the checkpoint.
checkpoint = tf.train.latest_checkpoint(config['experiment_root'])
print('Restoring from checkpoint: {}'.format(checkpoint))
tf.train.Saver().restore(sess, checkpoint)


raw_img = cv2.imread('<img>')
raw_img = cv2.resize(raw_img, net_input_size)
raw_img = np.swapaxes(raw_img, 0, 1)
raw_img = np.expand_dims(raw_img, axis=0)

emb = sess.run(endpoints['emb'],  feed_dict={img: raw_img})[0]

But the result for a same image with my code and your code are not the same.

Note that there is no any augmentation added when I compute the embedding vector.

Am I missing anything here? Thanks you for the help

Jun 03 '19 07:06 lamhoangtung

Quick update, I've just found out that you guys used tf.image.decode_jpeg and tf.image.resize_images instead of OpenCV, I switched to it, the output result is different but still not the same as your code.

Am I missing something like normalization ?? Here is what I've changed:

path = tf.placeholder(tf.string)
image_encoded = tf.read_file(path)
image_decoded = tf.image.decode_jpeg(image_encoded, channels=3)
image_resized = tf.image.resize_images(image_decoded, net_input_size)
img = tf.expand_dims(image_resized, axis=0)

Thanks ;)

Jun 03 '19 08:06 lamhoangtung

The only thing that comes to mind right now, is that by the default we use test time augmentation, which you don't. But that depends on how you are using our embed script to create comparable embeddings in this'll case.

On Mon, Jun 3, 2019, 10:20 Hoàng Tùng Lâm (Linus) [email protected] wrote:

Quick update, I've just found out that you guys used tf.image.decode_jpeg and tf.image.resize_images instead of OpenCV, I switched to it, the output result is different but still not the same as your code.

Am I missing something like normalization ?? Here is what I've changed:

path = tf.placeholder(tf.string) image_encoded = tf.read_file(path) image_decoded = tf.image.decode_jpeg(image_encoded, channels=3) image_resized = tf.image.resize_images(image_decoded, net_input_size) img = tf.expand_dims(image_resized, axis=0)

Thanks ;)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/81?email_source=notifications&email_token=AAOJDTKVVJNQMXNIJVCBXJLPYTH5ZA5CNFSM4HSFMPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWYVD7I#issuecomment-498160125, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOJDTKPBI5V64CRLIFKBADPYTH5ZANCNFSM4HSFMPLQ .

Jun 03 '19 09:06 Pandoro

Hi @Pandoro, Thanks for the quick response. This is what I use to compute the embedding vector:

python3 embed.py \
    --experiment_root ... \
    --dataset ... \
    --filename ...

I extracted the vector from the .h5 file.

Anyways, how can I do TTA in my case? Are there any code in your repo I can reference?

Jun 03 '19 10:06 lamhoangtung

If you use it like that, it should actually not be doing any test time augmentation, so that shouldn't be it either. The code to do so is included in embed.py. The only thing that comes to mind is that maybe something goes wrong during extracting of the embedding? Have you tried creating a csv file only containing the one image you want to embed?

On Mon, Jun 3, 2019, 12:18 Hoàng Tùng Lâm (Linus) [email protected] wrote:

Hi @Pandoro https://github.com/Pandoro, Thanks for the quick response. This is what I use to compute the embedding vector:

python3 embed.py
--experiment_root ...
--dataset ...
--filename ...

I extracted the vector from the .h5 file.

Anyways, how can I do TTA in my case? Are there in code in your repo I can reference?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/81?email_source=notifications&email_token=AAOJDTJH7MYEI2SRQ56BOZDPYTVZBA5CNFSM4HSFMPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWY6Y2A#issuecomment-498199656, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOJDTMMDIW4NTMPXFPC533PYTVZBANCNFSM4HSFMPLQ .

Jun 03 '19 11:06 Pandoro

Hi. I did an experiments with a csv file contain only the image that I want to embed and found something really strange. Actually there might be nothing wrong with you guys’s embed code and my inference code.

The h5 output file that I previously use for comparison was created on a remote server with GPU enabled.
My inference code was run on my local machine which only have CPU. After I try to compute everything again only on my CPU, I found that there are a big difference on the embedded vector computed by GPU vs CPU. (My code and yours produce exactly the same results)
Note that the difference are HUGE, like completely different. I did double check the model, code and input images for the experiment Have you ever seen something like this ? Am I wrong at some point ?

Jun 03 '19 11:06 lamhoangtung

I haven't seen this before. I wouldn't be surprised if there are tiny differences, but we frequently used CPUs to embed and evaluate stuff when all GPUs were busy and that worked fine. So something seems to be wrong. Are you using the same tensorflow version for both CPU and GPU?

On Mon, Jun 3, 2019, 13:40 Hoàng Tùng Lâm (Linus) [email protected] wrote:

Hi. I did an experiments with a csv file contain only the image that I want to embed and found something really strange. Actually there might be nothing wrong with you guys’s embed code and my inference code.

The h5 output file that I previously use for comparison was created on a remote server with GPU enabled.

My inference code was run on my local machine which only have CPU. After I try to compute everything again only on my CPU, I found that there are a big difference on the embedded vector computed by GPU vs CPU. (My code and yours produce exactly the same results)

Note that the difference are HUGE, like completely different. I did double check the model, code and input images for the experiment Have you ever seen something like this ? Am I wrong at some point ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/81?email_source=notifications&email_token=AAOJDTK52KRGKK5T7QGG75LPYT7KDA5CNFSM4HSFMPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWZEJTQ#issuecomment-498222286, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOJDTP6DNWMJRNBOUJG3OLPYT7KDANCNFSM4HSFMPLQ .

Jun 03 '19 11:06 Pandoro

@Pandoro Same tensorflow 1.12.0 on both machine

Jun 03 '19 12:06 lamhoangtung

Some update on this. I tried to redo everything, even training and here is the result:

Embedded vector computed by my single image inference code on both GPU and CPU (same result):

[ 0.1475507   0.26669884 -0.10536072 -0.7495441  -0.05301389 -0.12123938
 -0.2105978   0.34713405 -0.06077751  0.38768452  0.46736327 -0.14455695
 -0.13443749  0.4708902  -0.53196555 -0.4674694   0.4387072  -0.01120797
  0.03252156  0.11937858  0.03637908 -0.23512752 -0.087494    0.40861905
  0.39684698 -0.25528368  0.53282946 -0.7992279  -0.04100448  0.607317
  0.37891495 -0.43027154 -0.09188752 -0.31797376  0.2922396   0.3039867
 -0.21458632 -0.40264758  0.01471368  0.14217973  0.29642326 -0.33412308
  0.61750454  0.02563823 -0.4100364  -0.4894322  -0.33408296 -0.30945992
 -0.03018434  0.06986241 -0.3707401  -0.1222352   0.19458997 -0.11415277
 -0.04913341 -0.0650656  -0.23189925 -0.3081076  -0.04566643  0.56977797
  0.1199189  -0.25228524 -0.10953259  0.5716973   0.07392599 -0.1805463
  0.03953229  0.12185388 -0.15962987 -0.21938688 -0.05884064  0.34342512
  0.26555967  0.21485685  0.3734443  -0.19710182 -0.4279406   0.23197423
 -0.27009133  0.30459598 -0.37105414  0.4993727   0.1789047   0.04352051
 -0.16855955 -0.6482116  -0.1902902  -0.02592199 -0.00989667  0.5478813
  0.3826628  -0.33704245  0.3876207  -0.39746612 -0.4097886   0.14956611
  0.03482605 -0.27635813  0.05575407 -0.26498005 -0.19787493 -0.22036389
  0.21582448  0.46559668 -0.41869876  0.12922227  0.0621463   0.01098646
  0.06490406  0.35996896  0.21602859 -0.34911785 -0.18451497  0.05639197
  0.04268607 -0.072242   -0.23873544 -0.09557254  0.03791614 -0.19931975
 -0.07070286  0.09722421  0.29151836 -0.02433551  0.2241952  -0.96187866
  0.13102485  0.00164846]

Embedded vector computed by you guys 's code but on a 100 line .csv file, contain only one sample duplicated 100 times: Same for 100 output vector, same for GPU and CPU and same as above
Embedded vector computed by you guys 's code but on a 100k+ line .csv file, which have the first sample use for above experiment: Same for GPU and CPU, but not same as above:

[ 4.46426451e-01  2.67341495e-01 -3.03951055e-01 -1.09888956e-01
  1.48094699e-01  1.09376453e-01  3.18785965e-01 -2.31513470e-01
  9.18060988e-02  9.47581697e-03 -3.14935297e-01 -5.06232917e-01
  2.13361338e-01  5.70732616e-02  5.59608713e-02 -2.04994321e-01
 -7.14561269e-02  4.35655147e-01  4.42430824e-01 -1.19181640e-01
 -9.79143828e-02  3.38607967e-01 -8.01632106e-02  8.19585398e-02
  3.10744733e-01 -5.10766864e-01  3.90632376e-02  3.73192802e-02
 -2.21006293e-02  1.50721356e-01  3.10757637e-01 -1.00263797e-01
 -3.67254391e-02  3.62346590e-01 -2.23815039e-01 -4.09024119e-01
 -7.41786659e-01 -2.77244627e-01 -6.83265150e-01 -3.71105620e-04
  3.62792283e-01 -3.34418714e-01  4.02492136e-01  2.93934852e-01
  5.06364256e-02  1.14161275e-01 -1.49569120e-02  2.07622617e-01
  9.04084072e-02  2.35464871e-01  1.60102062e-02 -1.07340008e-01
 -6.13746643e-01 -1.84301529e-02 -3.65158543e-02 -2.17433404e-02
  4.48067039e-01  3.31106067e-01  2.05742702e-01 -1.24085128e-01
  2.07252398e-01 -5.85925281e-01 -2.59883493e-01  2.63391703e-01
 -3.12482953e-01 -1.48463324e-01 -2.19984993e-01  3.31126675e-02
  1.76012367e-01  3.09261560e-01 -1.59823354e-02  1.53631851e-01
  1.53570157e-02 -2.29165092e-01  3.28389913e-01 -2.26212129e-01
 -3.93793285e-01 -1.54186189e-01 -4.85752940e-01  1.30166719e-02
 -5.14035374e-02 -1.77116096e-01  9.73375281e-05 -2.54578739e-02
  3.99445705e-02  4.45321977e-01  2.78115660e-01 -1.51245281e-01
 -3.03700745e-01 -3.81025001e-02  1.43309757e-01 -6.55035377e-01
  8.83019418e-02 -3.06550767e-02 -4.80769187e-01  4.71787043e-02
  5.49029335e-02 -1.17088296e-01  3.43144536e-01 -7.30120242e-02
 -3.58440757e-01 -1.66995618e-02 -3.06979388e-01  5.11138923e-02
  1.75048336e-01 -1.83060188e-02 -3.81746352e-01 -6.02350771e-01
 -3.84051464e-02  5.41097879e-01  2.33160406e-01  8.10048282e-02
 -4.97415751e-01 -3.47296298e-02 -8.40142891e-02  2.04959571e-01
  6.48377165e-02 -1.64840698e-01  1.98047027e-01  1.82637498e-01
 -9.53407511e-02  2.63416976e-01 -1.82583451e-01 -3.99179049e-02
  2.82630742e-01 -6.65262759e-01 -5.13938844e-01 -1.60764366e-01]

I tried to search for the entire 100k+ computed vector to find if there are any index got messed up but I can't find it.

Where can I potentially be wrong ?. Here is how I extract the vector out of the h5 file:

import h5py
import numpy as np

raw_embedding = h5py.File('....h5', 'r')
raw_label = pd.read_csv('...csv')

def load_data():
    features = raw_embedding['emb'].value
    labels = list(raw_label.iloc[:, 1])
    return (features, labels)

vecs, imgs = load_data()
print(vecs[0], imgs[0])

Thanks for your help @Pandoro

Jun 04 '19 03:06 lamhoangtung

Note: I tried a bunch of different images, so the problem is not related to the first sample of the dataset only. => Question: Did you guys do any datasets level normalization ?

Jun 04 '19 03:06 lamhoangtung

I can't say that this sounds like anything I've seen before. If I get it right, GPU and CPU results are now the same, but it depends on if you have several other images in your batch or just one specific one?

It sounds like something might be going wrong with the batch normalization, but your script you clearly sets is_training=False. We don't do any other normalization, so I honestly have no idea where this could be coming from.

On Tue, Jun 4, 2019 at 5:28 AM Hoàng Tùng Lâm (Linus) < [email protected]> wrote:

Note: I tried a bunch of different images, so the problem is not related to the first sample of the dataset only. => Question: Did you guys do any datasets level normalization ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/81?email_source=notifications&email_token=AAOJDTKBVA6VCMYDLA3ZBNLPYXON7A5CNFSM4HSFMPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW3JVDY#issuecomment-498506383, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOJDTLAXHIJDN24RJTKVCTPYXON7ANCNFSM4HSFMPLQ .

Jun 04 '19 13:06 Pandoro

So which one should I use ? Which one is more accurate ? Should I create a fake batches ? Or should I keep the batch_size = 1 when inference ?

Jun 04 '19 16:06 lamhoangtung

There is no useful answer to that question. What you are seeing shouldn't be happening. Currently I don't have time to investigate if this is an issue with our code, but I highly doubt it since we haven't seen any such issues so far.

As it is right now, your setup seems to be somehow broken and thus there is no "more accurate".

What you could do is to try and download our pretrained model and run the evaluation on Market-1501 to see if you can recreate our original scores. If you get a different score, something else is broken.

On Tue, Jun 4, 2019 at 6:32 PM Hoàng Tùng Lâm (Linus) < [email protected]> wrote:

So which one should I use ? Which one is more accurate ? Should I create a fake batches ? Or should I keep the batch_size = 1 when inference ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/VisualComputingInstitute/triplet-reid/issues/81?email_source=notifications&email_token=AAOJDTNSIL7AGOPJY5HQQEDPY2KK3A5CNFSM4HSFMPL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODW5ENZQ#issuecomment-498747110, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOJDTOUV42FIPSDFF6SUYTPY2KK3ANCNFSM4HSFMPLQ .

Jun 04 '19 16:06 Pandoro

@lamhoangtung , Were you able to figure this out? I'm trying to follow your steps to generate embeddings and compare them. But so far I'm running into some errors:

I cannot load the model this way for some reason. #85

checkpoint = tf.train.latest_checkpoint(config['experiment_root'])

I tried loading the model this way,

saver = tf.train.import_meta_graph('experiments\my_experiment\checkpoint-25000.meta')
saver.restore(sess, 'experiments\my_experiment\checkpoint-25000')

but that still gives me an error when I try to run
emb = sess.run(endpoints['emb'], feed_dict={img: raw_img})[0]

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma
	 [[node resnet_v1_50/block4/unit_3/bottleneck_v1/conv3/BatchNorm/gamma/read (defined at C:\Users\mazat\Documents\Python\trinet\nets\resnet_v1.py:118) ]]
	 [[node head/emb/BiasAdd (defined at C:\Users\mazat\Documents\Python\trinet\heads\fc1024.py:17) ]]

Thanks

Nov 06 '19 18:11 mazatov

@lamhoangtung I think I figured out the first problem.

cv2 loads the image in BGR style so you need to convert it to RGB.
There seem to be some differences in the way cv2 and tensorflow load jpeg image. Check https://stackoverflow.com/questions/45516859/differences-between-cv2-image-processing-and-tf-image-processing

So, to get cv2 load embeddings close to the embed.py values, I did the following.

raw_img = cv2.imread(os.path.join(config['image_root'],'query', '0001_c1s1_001051_00.jpg'))
raw_img = cv2.cvtColor(raw_img, cv2.COLOR_BGR2RGB)
raw_img = cv2.resize(raw_img, (net_input_size[1], net_input_size[0]))
raw_img = np.expand_dims(raw_img, axis=0)

If you want to get the exactly same values you can load the image with TF instead of CV2

image_encoded = tf.read_file(os.path.join(config['image_root'],'query', '0001_c1s1_001051_00.jpg'))
image_decoded = tf.image.decode_jpeg(image_encoded, channels=3)
image_resized = tf.image.resize_images(image_decoded, net_input_size)
img = tf.expand_dims(image_resized, axis=0)

# Create the model and an embedding head.
model = import_module('nets.' + config['model_name'])
head = import_module('heads.' + config['head_name'])

endpoints, _ = model.endpoints(img, is_training=False)
with tf.name_scope('head'):
    endpoints = head.head(endpoints, config['embedding_dim'], is_training=False)

tf.train.Saver().restore(sess, os.path.join(config['experiment_root'],'checkpoint-25000') )



emb = sess.run(endpoints['emb'])[0]

I got almost identical embeddings this way

Nov 07 '19 18:11 mazatov

triplet-reid triplet-reid copied to clipboard

Embed a single image

triplet-reid
triplet-reid copied to clipboard