caffe-tensorflow
caffe-tensorflow copied to clipboard
Converted tensorflow model's output does not match the caffe output
Hi,
I was able to convert the model from caffe to tensorflow using the repo. The model is a CNN with mainly conv, batch normalisation, maxpool ,fully connected and softmax layers.
However the output of caffe model and tensorflow model do not match. I have 2 classes for which i run classification. For the same image, while the caffe model is very confident ([0.99,0.01]) the tensorflow model is right is very less confident ([0.49,0.51]).
I observe the same results for many images, the classification output for tensorflow model is always around ([0.489, 0.511)) and so on. Basically both the classes look equal probable, while the caffe model is very confident of one of the classes.
Anyone else seen this? Any ideas on where the bug could be?
The output is always same/similar. I am using tensorflow 1.2.1
I found the same thing with my models.
The problem was that I hadn't converted my input data into TensorFlow's HxWxC layout/ordering. (Caffe's is CxHxW.)
Hi @sanchom , Thanks for responding. After converting the model using this repo. I am reading an image using PIL and converting it into np.array. I checked the input is in HxWxC shape.
Is this what we are supposed to do? Really appreciate any pointers or if you could share your code of the pre-processing (input conversion) steps.
Also if you could answer these
- Do we need to change the channels to BGR, before using the converted model?
Another weird thing. i found is that the output changes for the same input , if I run the code many times!
@shresthamalik I think that's correct. We need to change our input shape/layout to match the expectation of the framework. Caffe expected CxHxW. Tensorflow expects [HxWxC].
Here's how I changed my code to get the same results from a Tensorflow-converted model as from the original Caffe model:
std::vector<float> data(3 * height_ * width_);
int offset = 0;
// for (int c = 0; c < 3; ++c) { This line was here when I was feeding the data to Caffe.
for (int h = 0; h < height_; ++h) {
for (int w = 0; w < width_; ++w) {
for (int c = 0; c < 3; ++c) { // Now, the line from above is here.
data[offset] =
(resized_image.at<cv::Vec3b>(h, w)[c] - 128) * 0.00392156862;
++offset;
}
}
}
You do have provide the same pixel order (BGR or RGB) to the Tensorflow-converted model as you did to Caffe. Neither framework treats the channels in any special way, but you need to make sure that what was going into channel 0 of your Caffe model is the same stuff that is going into channel 0 of your Tensorflow-converted model.
Also, check that you're doing all the same pre-processing that you did for the Caffe model (any centering or normalizing, etc.).
It's surprising that the output changes from run to run. I don't know the details about how BatchNormalization works, but my rough understanding is that it computes a normalization factor on statistics of the batch (at least during training). That's another possible source of variation if you're doing inference in batches and for some reason the BatchNormalization layer is still working in training mode. I see an issue here related to BatchNorm's moving average.