Mocha.jl Support cafemodel directly

Yes, asked before in #55, and yes caffemodels can be converted. Not good enough. But looks like they can be read directly using Julia ProtoBuf.jl see tanmaykm/ProtoBuf.jl#48.

To confirm, what test would you propose?

Jul 14 '15 05:07 waTeim

Following up, I wrote a little exploratory stuff and here are the layer types of GoogleNet. InnerProducts are supported, and I guess Convolutions. What about the rest? Looks like in addition there's DATA POOLING RELU SPLIT SOFTMAX_LOSS LRN CONCAT DROPOUT

julia> import CaffeOperations;
julia> x = CaffeOperations.loadCaffeeNetwork("bvlc_googlenet.caffemodel");

julia> reshape(CaffeOperations.layerTypes(x),(13,13))
13x13 Array{Symbol,2}:
 :DATA         :CONVOLUTION  :CONCAT       :CONVOLUTION  :CONVOLUTION    :INNER_PRODUCT  …  :RELU           :DROPOUT        :POOLING      :RELU         :RELU         
 :SPLIT        :RELU         :SPLIT        :RELU         :RELU           :SOFTMAX_LOSS      :CONVOLUTION    :INNER_PRODUCT  :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  
 :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  :CONCAT       :POOLING        :CONVOLUTION       :RELU           :SOFTMAX_LOSS   :RELU         :RELU         :RELU         
 :RELU         :RELU         :RELU         :POOLING      :CONVOLUTION    :RELU              :POOLING        :CONVOLUTION    :CONCAT       :POOLING      :CONVOLUTION  
 :POOLING      :CONVOLUTION  :CONVOLUTION  :SPLIT        :RELU           :CONVOLUTION       :CONVOLUTION    :RELU           :POOLING      :CONVOLUTION  :RELU         
 :LRN          :RELU         :RELU         :CONVOLUTION  :CONCAT         :RELU           …  :RELU           :CONVOLUTION    :SPLIT        :RELU         :POOLING      
 :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  :RELU         :SPLIT          :CONVOLUTION       :CONCAT         :RELU           :CONVOLUTION  :CONCAT       :CONVOLUTION  
 :RELU         :RELU         :RELU         :CONVOLUTION  :POOLING        :RELU              :SPLIT          :CONVOLUTION    :RELU         :SPLIT        :RELU         
 :CONVOLUTION  :CONVOLUTION  :CONVOLUTION  :RELU         :CONVOLUTION    :CONVOLUTION       :POOLING        :RELU           :CONVOLUTION  :CONVOLUTION  :CONCAT       
 :RELU         :RELU         :RELU         :CONVOLUTION  :RELU           :RELU              :CONVOLUTION    :CONVOLUTION    :RELU         :RELU         :POOLING      
 :LRN          :POOLING      :CONVOLUTION  :RELU         :INNER_PRODUCT  :CONVOLUTION    …  :RELU           :RELU           :CONVOLUTION  :CONVOLUTION  :DROPOUT      
 :POOLING      :CONVOLUTION  :RELU         :CONVOLUTION  :RELU           :RELU              :INNER_PRODUCT  :CONVOLUTION    :RELU         :RELU         :INNER_PRODUCT
 :SPLIT        :RELU         :POOLING      :RELU         :DROPOUT        :POOLING           :RELU           :RELU           :CONVOLUTION  :CONVOLUTION  :SOFTMAX_LOSS 

julia> x.name
"GoogleNet"

Jul 15 '15 03:07 waTeim

All the layers mentioned here are supported. Checkout the I Julia notebook of pretrained image net model for example of the correspondence.

Jul 15 '15 06:07 pluskid

Yea I'm reading the docs now looks like there's a translation possible, I'm looking at them one by one.

So far looks like all of the convolution layers have 2 blobs associated with them is that expected?

As far as the Xavier filler <--> Initializer, looks like the caffe model allows parameterization?

dump(x.layers[9].convolution_param)
...
  weight_filler: CaffeOperations.caffe.FillerParameter 
    _type: ASCIIString "xavier"
    value: Float32 0.0
    min: Float32 0.0
    max: Float32 1.0
    mean: Float32 0.0
    std: Float32 0.03      <--- here
    sparse: Int32 -1
    variance_norm: Int32 0

Do you have a URL for that notebook?

Jul 15 '15 15:07 waTeim

Sorry I'm currently traveling and do not have a computer. So I'll try to be brief.

There is a link to the notebook in the tutorial section of the doc. Currently xavier layer is not customizable I believe, but it should be very easy to add a parameter.

Yes convolutional layer expect two blocks, but you can always set the bias blob to zero if do not need it.

Jul 15 '15 18:07 pluskid

that's fine, the caffe file does have 2 blobs. Bias blob? Does that correspond to bottom?

make_blob(backend, ...

Should this have a default value of whatever the current backend is?

Jul 17 '15 20:07 waTeim

Yes caffe has two blobs for convolution. They are not bottoms, bottoms are input blobs, what we were talking about are parameter blobs.

I'm not sure I like the idea of a global backend. The idea is a user should supply an initialized backend whenever he wAnted to do something important. I think it is perfectly fine for the function that converting caffe model to accept a backend parameter.

Jul 18 '15 05:07 pluskid

So how are the parameter blobs connected to a Convolution layer. I see the only candidates are bottom and top. If not those then what else is there?

Jul 21 '15 02:07 waTeim

@waTeim filters and bias are parameters of a layer. For example, in an InnerProductLayer, top = parameter * bottom. There are three kinds of blobs: input (bottom), output (top), and weight/filters (parameters).

Jul 22 '15 00:07 pluskid

The part I'm having trouble with is the mapping. Here's the ProtoBuf description. There is a blobs field in the layers section. When read this field is populated with 2 blobs. Which is which? They're no labeled

julia> size(x.layers[9].blobs)
(2,)

Bottoms and tops are set to arrays of symbols which I think refer to some index, how do the blobs get associated with those symbols. Does the .ipynb make it clear?

Current, maybe wrong

  return Mocha.ConvolutionLayer(
   name = caffeLayer.name,
   n_filter = Int(caffeLayer.convolution_param.num_output),
   kernel = kernel,
   pad = pad,
   stride = stride,
   filter_init = newInitializer(caffeLayer.convolution_param.weight_filler),
   bias_init = biasInitializer,
   tops = getLayerRefList(caffeLayer.top),
   bottoms = getLayerRefList(caffeLayer.bottom)
  );

Jul 22 '15 05:07 waTeim

Is the x object a mocha net or a caffe net? In mocha layer state, there is a field called blobs which hold reference to output blobs, but you don't need to care about them as they will be created automatically. In contrary, in caffe, iirc, the blobs fields holds the parameter blobs. You can do the following things with it:

Ignore it, as the parameter blobs will be created automatically according to the specification such as n-filter, etc
You may do cross checking to make sure that the shape of the parameter blobs matches the specification of layer definition. Eg is the n-filter parameter correct?
If the caffe file contains a already trained model, you can actually copy those blobs out and use a customized initializer for the parameter blobs so that they are filled with those trained parameters instead of random init values.

Jul 22 '15 12:07 pluskid

x is a parsed trained caffe net, so looks like option 3. Is this simply a matter of creating a new Initializer type?

Jul 23 '15 02:07 waTeim

Yes, the easiest way I can imagine is to create an initializer that simply copy the content of an existing array to the target blob being initialized. Something roughly like

ConvolutionLayer(..., filter_init=CopyInitializer(caffe_layer.blobs[1]), bias_init=CopyInitializer(caffe_layer.blobs[2]),...)

Jul 23 '15 11:07 pluskid

Took a while, but I'm back on it. This look about right?

immutable CopyInitializer <: Mocha.Initializer
   caffeBlob::caffe.BlobProto
end

function init(initializer::CopyInitializer,blob::Mocha.Blob)
   Mocha.fill!(blob,initializer.caffeBlob.data)
end

Aug 09 '15 05:08 waTeim

Yes, maybe small modifications

I'm not sure whether the data in caffe.BlobProto will retain after you close the protobuffer file. You might need to copy the data into a Julia array and hold the Julia array in your CopyInitializer instead.
You should use Mocha.copy! instead of fill! as fill! is only used to fill everywhere with a scalar.

Aug 09 '15 14:08 pluskid

I'm pretty sure that by the time it gets into to caffe.BlobProto it's an array that exists independent of the file, normal GC applies. Re using copy, yea asy change.

Coming up next is the Inner Product, layer type which seems to be straightforward, except it's not clear to me that caffe's num_output is equivalent to Mocha's dim, though it did appear to be the only choice left.

Here's the Protobuf stuff:

type InnerProductParameter
    num_output::UInt32
    bias_term::Bool
    weight_filler::FillerParameter
    bias_filler::FillerParameter
    axis::Int32

From caffe's docs:

Parameters (InnerProductParameter inner_product_param)

Required num_output (c_o): the number of filters
Strongly recommended weight_filler [default type: 'constant' value: 0]
Optional bias_filler [default type: 'constant' value: 0] bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs

Aug 09 '15 19:08 waTeim

@waTeim Yes, num_output is exactly output_dim, and similar as before, the fillers correspond to initializers in Mocha.

Aug 10 '15 14:08 pluskid

Last layer type, it's the data layer type and comes with a Transformation Parameter:

type TransformationParameter
    scale::Float32
    mirror::Bool
    crop_size::UInt32
    mean_file::AbstractString
    mean_value::Array{Float32,1}
    force_color::Bool
    force_gray::Bool
    TransformationParameter() = (o=new(); fillunset(o); o)
end #type TransformationParameter

A number of these things don't appear to be supported, but scale and mean. It looks like Caffe assumes both of these things happen simultaneously while mocha appears to want to apply one then the other (assuming mean subtraction followed by scaling). Caffe appears to have multiple mean values (1 per channel?) while Mocha want's a blob.

What's the expected format of this blob?

Aug 16 '15 01:08 waTeim

Limited success.

To keep things simple I used cifar10_nin.caffemodel from Model Zoo
The output can be seen here.
I just arbitrarily picked input blob dimensions of 10x10x1x1 which is almost certainly wrong.

the critical line is this

 x = CaffeOperations.convertCaffeNetwork("cifar10_nin.caffemodel",[(10,10,1,1),(10,10,1,1)]);

How do I determine the input blobs dims. This comes from the data?

Aug 16 '15 21:08 waTeim

scale and mean can be mapped to DataTransformers in Mocha.

Caffe specifies everything together, but technically they cannot happen "together". For example, caffe subtract the mean first, and then do re-scaling. See their code here: https://github.com/BVLC/caffe/blob/master/src/caffe/data_transformer.cpp#L113

Yes, Mocha data transformer expect a mean blob, which should be of the same shape as the data point. Specifically, for image data, we can make this blob by duplicating values for channels at each pixel location. For example,

mean_channels = [1,2,3] # a is an array of mean values for each of the RGB channel
img_width = 256
img_height = 256
mean_channels = reshape(mean_channels, (1,1,3)) # make it proper shape
mean_img = repeat(mean_channels, inner=[img_width,img_height,1]) # of proper layout for mean_blob

crop option can be supported by the CropLayer in Mocha.

force_color and force_gray are not supported yet.

Aug 17 '15 15:08 pluskid

@waTeim That is brilliant! I'm not sure why do you need to decide the input blob dims? I'm not sure whether Caffe model stored this information somewhere. They will be automatically determined when the program start reading data from the HDF5 files. Do you mean you need this shape information in the data transformer?

Aug 17 '15 15:08 pluskid

Hey, thanks! As far as dims, I kinda brought it on myself as I'm trying to remain agnostic as much as I can, and therefore as using MemoryDataLayer. Potentially I can use LevelDB directly as well with some additional help.

Here's the still primitive function in question:

function newDataLayer(caffeLayer::caffe.V1LayerParameter,dims)
   data = Vector{Array}();
   for i = 1:length(dims)
      push!(data,Array(Float32,dims[i]))
   end
   transformers::Vector = [];
   if ProtoBuf.has_field(caffeLayer,:transform_param)
      scale = Float32(caffeLayer.transform_param.scale)
      push!(transformers,Mocha.DataTransformers.Scale(scale));
   end
   return Mocha.MemoryDataLayer(
    name = caffeLayer.name,
    batch_size = 1,
    data = data,    
    transformers = transformers,
    tops = getLayerRefList(caffeLayer.top)
   );
end

Aug 17 '15 19:08 waTeim

Mocha.jl Mocha.jl copied to clipboard

Support cafemodel directly

Mocha.jl
Mocha.jl copied to clipboard