Mocha.jl
Mocha.jl copied to clipboard
Support cafemodel directly
Yes, asked before in #55, and yes caffemodels can be converted. Not good enough. But looks like they can be read directly using Julia ProtoBuf.jl see tanmaykm/ProtoBuf.jl#48.
To confirm, what test would you propose?
Following up, I wrote a little exploratory stuff and here are the layer types of GoogleNet. InnerProducts are supported, and I guess Convolutions. What about the rest? Looks like in addition there's DATA POOLING RELU SPLIT SOFTMAX_LOSS LRN CONCAT DROPOUT
julia> import CaffeOperations;
julia> x = CaffeOperations.loadCaffeeNetwork("bvlc_googlenet.caffemodel");
julia> reshape(CaffeOperations.layerTypes(x),(13,13))
13x13 Array{Symbol,2}:
:DATA :CONVOLUTION :CONCAT :CONVOLUTION :CONVOLUTION :INNER_PRODUCT … :RELU :DROPOUT :POOLING :RELU :RELU
:SPLIT :RELU :SPLIT :RELU :RELU :SOFTMAX_LOSS :CONVOLUTION :INNER_PRODUCT :CONVOLUTION :CONVOLUTION :CONVOLUTION
:CONVOLUTION :CONVOLUTION :CONVOLUTION :CONCAT :POOLING :CONVOLUTION :RELU :SOFTMAX_LOSS :RELU :RELU :RELU
:RELU :RELU :RELU :POOLING :CONVOLUTION :RELU :POOLING :CONVOLUTION :CONCAT :POOLING :CONVOLUTION
:POOLING :CONVOLUTION :CONVOLUTION :SPLIT :RELU :CONVOLUTION :CONVOLUTION :RELU :POOLING :CONVOLUTION :RELU
:LRN :RELU :RELU :CONVOLUTION :CONCAT :RELU … :RELU :CONVOLUTION :SPLIT :RELU :POOLING
:CONVOLUTION :CONVOLUTION :CONVOLUTION :RELU :SPLIT :CONVOLUTION :CONCAT :RELU :CONVOLUTION :CONCAT :CONVOLUTION
:RELU :RELU :RELU :CONVOLUTION :POOLING :RELU :SPLIT :CONVOLUTION :RELU :SPLIT :RELU
:CONVOLUTION :CONVOLUTION :CONVOLUTION :RELU :CONVOLUTION :CONVOLUTION :POOLING :RELU :CONVOLUTION :CONVOLUTION :CONCAT
:RELU :RELU :RELU :CONVOLUTION :RELU :RELU :CONVOLUTION :CONVOLUTION :RELU :RELU :POOLING
:LRN :POOLING :CONVOLUTION :RELU :INNER_PRODUCT :CONVOLUTION … :RELU :RELU :CONVOLUTION :CONVOLUTION :DROPOUT
:POOLING :CONVOLUTION :RELU :CONVOLUTION :RELU :RELU :INNER_PRODUCT :CONVOLUTION :RELU :RELU :INNER_PRODUCT
:SPLIT :RELU :POOLING :RELU :DROPOUT :POOLING :RELU :RELU :CONVOLUTION :CONVOLUTION :SOFTMAX_LOSS
julia> x.name
"GoogleNet"
All the layers mentioned here are supported. Checkout the I Julia notebook of pretrained image net model for example of the correspondence.
Yea I'm reading the docs now looks like there's a translation possible, I'm looking at them one by one.
So far looks like all of the convolution layers have 2 blobs associated with them is that expected?
As far as the Xavier filler <--> Initializer, looks like the caffe model allows parameterization?
dump(x.layers[9].convolution_param)
...
weight_filler: CaffeOperations.caffe.FillerParameter
_type: ASCIIString "xavier"
value: Float32 0.0
min: Float32 0.0
max: Float32 1.0
mean: Float32 0.0
std: Float32 0.03 <--- here
sparse: Int32 -1
variance_norm: Int32 0
Do you have a URL for that notebook?
Sorry I'm currently traveling and do not have a computer. So I'll try to be brief.
There is a link to the notebook in the tutorial section of the doc. Currently xavier layer is not customizable I believe, but it should be very easy to add a parameter.
Yes convolutional layer expect two blocks, but you can always set the bias blob to zero if do not need it.
that's fine, the caffe file does have 2 blobs. Bias blob? Does that correspond to bottom?
make_blob(backend, ...
Should this have a default value of whatever the current backend is?
Yes caffe has two blobs for convolution. They are not bottoms, bottoms are input blobs, what we were talking about are parameter blobs.
I'm not sure I like the idea of a global backend. The idea is a user should supply an initialized backend whenever he wAnted to do something important. I think it is perfectly fine for the function that converting caffe model to accept a backend parameter.
So how are the parameter blobs connected to a Convolution layer. I see the only candidates are bottom and top. If not those then what else is there?
@waTeim filters and bias are parameters of a layer. For example, in an InnerProductLayer, top = parameter * bottom. There are three kinds of blobs: input (bottom), output (top), and weight/filters (parameters).
The part I'm having trouble with is the mapping. Here's the ProtoBuf description. There is a blobs field in the layers section. When read this field is populated with 2 blobs. Which is which? They're no labeled
julia> size(x.layers[9].blobs)
(2,)
Bottoms and tops are set to arrays of symbols which I think refer to some index, how do the blobs get associated with those symbols. Does the .ipynb make it clear?
Current, maybe wrong
return Mocha.ConvolutionLayer(
name = caffeLayer.name,
n_filter = Int(caffeLayer.convolution_param.num_output),
kernel = kernel,
pad = pad,
stride = stride,
filter_init = newInitializer(caffeLayer.convolution_param.weight_filler),
bias_init = biasInitializer,
tops = getLayerRefList(caffeLayer.top),
bottoms = getLayerRefList(caffeLayer.bottom)
);
Is the x object a mocha net or a caffe net? In mocha layer state, there is a field called blobs which hold reference to output blobs, but you don't need to care about them as they will be created automatically. In contrary, in caffe, iirc, the blobs fields holds the parameter blobs. You can do the following things with it:
- Ignore it, as the parameter blobs will be created automatically according to the specification such as n-filter, etc
- You may do cross checking to make sure that the shape of the parameter blobs matches the specification of layer definition. Eg is the n-filter parameter correct?
- If the caffe file contains a already trained model, you can actually copy those blobs out and use a customized initializer for the parameter blobs so that they are filled with those trained parameters instead of random init values.
x is a parsed trained caffe net, so looks like option 3. Is this simply a matter of creating a new Initializer type?
Yes, the easiest way I can imagine is to create an initializer that simply copy the content of an existing array to the target blob being initialized. Something roughly like
ConvolutionLayer(..., filter_init=CopyInitializer(caffe_layer.blobs[1]), bias_init=CopyInitializer(caffe_layer.blobs[2]),...)
Took a while, but I'm back on it. This look about right?
immutable CopyInitializer <: Mocha.Initializer
caffeBlob::caffe.BlobProto
end
function init(initializer::CopyInitializer,blob::Mocha.Blob)
Mocha.fill!(blob,initializer.caffeBlob.data)
end
Yes, maybe small modifications
- I'm not sure whether the data in
caffe.BlobProto
will retain after you close the protobuffer file. You might need to copy the data into a Julia array and hold the Julia array in yourCopyInitializer
instead. - You should use
Mocha.copy!
instead offill!
asfill!
is only used to fill everywhere with a scalar.
I'm pretty sure that by the time it gets into to caffe.BlobProto it's an array that exists independent of the file, normal GC applies. Re using copy, yea asy change.
Coming up next is the Inner Product, layer type which seems to be straightforward, except it's not clear to me that caffe's num_output is equivalent to Mocha's dim, though it did appear to be the only choice left.
Here's the Protobuf stuff:
type InnerProductParameter
num_output::UInt32
bias_term::Bool
weight_filler::FillerParameter
bias_filler::FillerParameter
axis::Int32
From caffe's docs:
Parameters (InnerProductParameter inner_product_param)
- Required num_output (c_o): the number of filters
- Strongly recommended weight_filler [default type: 'constant' value: 0]
- Optional bias_filler [default type: 'constant' value: 0] bias_term [default true]: specifies whether to learn and apply a set of additive biases to the filter outputs
@waTeim Yes, num_output
is exactly output_dim
, and similar as before, the filler
s correspond to initializer
s in Mocha.
Last layer type, it's the data layer type and comes with a Transformation Parameter:
type TransformationParameter
scale::Float32
mirror::Bool
crop_size::UInt32
mean_file::AbstractString
mean_value::Array{Float32,1}
force_color::Bool
force_gray::Bool
TransformationParameter() = (o=new(); fillunset(o); o)
end #type TransformationParameter
A number of these things don't appear to be supported, but scale and mean. It looks like Caffe assumes both of these things happen simultaneously while mocha appears to want to apply one then the other (assuming mean subtraction followed by scaling). Caffe appears to have multiple mean values (1 per channel?) while Mocha want's a blob.
What's the expected format of this blob?
Limited success.
- To keep things simple I used cifar10_nin.caffemodel from Model Zoo
- The output can be seen here.
- I just arbitrarily picked input blob dimensions of 10x10x1x1 which is almost certainly wrong.
the critical line is this
x = CaffeOperations.convertCaffeNetwork("cifar10_nin.caffemodel",[(10,10,1,1),(10,10,1,1)]);
How do I determine the input blobs dims. This comes from the data?
scale
and mean
can be mapped to DataTransformer
s in Mocha.
Caffe specifies everything together, but technically they cannot happen "together". For example, caffe subtract the mean first, and then do re-scaling. See their code here: https://github.com/BVLC/caffe/blob/master/src/caffe/data_transformer.cpp#L113
Yes, Mocha data transformer expect a mean blob, which should be of the same shape as the data point. Specifically, for image data, we can make this blob by duplicating values for channels at each pixel location. For example,
mean_channels = [1,2,3] # a is an array of mean values for each of the RGB channel
img_width = 256
img_height = 256
mean_channels = reshape(mean_channels, (1,1,3)) # make it proper shape
mean_img = repeat(mean_channels, inner=[img_width,img_height,1]) # of proper layout for mean_blob
crop
option can be supported by the CropLayer
in Mocha.
force_color
and force_gray
are not supported yet.
@waTeim That is brilliant! I'm not sure why do you need to decide the input blob dims? I'm not sure whether Caffe model stored this information somewhere. They will be automatically determined when the program start reading data from the HDF5 files. Do you mean you need this shape information in the data transformer?
Hey, thanks! As far as dims, I kinda brought it on myself as I'm trying to remain agnostic as much as I can, and therefore as using MemoryDataLayer. Potentially I can use LevelDB directly as well with some additional help.
Here's the still primitive function in question:
function newDataLayer(caffeLayer::caffe.V1LayerParameter,dims)
data = Vector{Array}();
for i = 1:length(dims)
push!(data,Array(Float32,dims[i]))
end
transformers::Vector = [];
if ProtoBuf.has_field(caffeLayer,:transform_param)
scale = Float32(caffeLayer.transform_param.scale)
push!(transformers,Mocha.DataTransformers.Scale(scale));
end
return Mocha.MemoryDataLayer(
name = caffeLayer.name,
batch_size = 1,
data = data,
transformers = transformers,
tops = getLayerRefList(caffeLayer.top)
);
end