PipeCNN tiny-YOLO Implementation

I am working on implementing tiny-YOLO using PipeCNN, was just looking for some advice and guidance for the best way to do it and the steps I should take.

I'm going to convert the tiny-YOLO weights file from darknet -> caffe and then use MATLAB fixed point toolbox to convert that to fixed-point weights that PipeCNN will use.

For updating the layer_config, how should I do this? What exactly is the format of the layer_config?

I will also be updating main.cpp to work with webcam feed.

Is there anything else here that I missed? What else will I need to do to get tiny-YOLO running?

Mar 21 '18 21:03 META-DREAMER

I have also working on implementing tiny-yolo voc on my de1soc.

Mar 22 '18 06:03 SmartRoof

@SmartRoof Have you made any progress?

Mar 23 '18 15:03 META-DREAMER

@hammadj we have it running, but its slower than we expected.

Mar 23 '18 17:03 zhao-lun

@johnnydept How did you setup the layer_config.h ? And what did you do for your weights file? I converted the tiny-yolo-voc.(cfg/weights) to caffe caffemodel and prototxt files, and then took that and merged the batch-norm layers into the conv layers and then finally used the Matlab script to convert the result into a weights.dat file. Did you do the same? Also, how is the performance and what board are you using?

Mar 23 '18 21:03 META-DREAMER

Here is where I am so far for the layer_config. Does this look okay?

// TINY YOLO CONFIGURATION
unsigned layer_config[][NUM_CONFIG_ITEM] = {
	{ // Layer1
		// layer_type (conv = 0, fc = 1)
		0, 
		//data_w, data_h, data_n, weight_w, weight_h, weight_n, weight_m, bias_size
		416, 416, 3, 3, 3, 3, 16, 16,
		// memrd_src (0-> data_buf, 1-> output_buf)
		0,
		// conv_x, conv_y, conv_z, conv_stride, conv_padding, conv_split, conv_relu
		416, 416, 16, 1, 1, 1, 1,
		// pool_on, pool_x, pool_y, pool_z, pool_size, pool_stride,
		1, 208, 208, 16, 2, 2,
		// lrn control (on = 1, off = 0)
		0,
		// memwr_dst (0-> data_buf, 1-> output_buf  "2"
		1
	},
	{ // Layer 2
		0,
		208, 208, 16, 3, 3, 8, 32, 32,
		1,
		208, 208, 32, 1, 1, 1, 1,
		1, 104, 104, 32, 2, 2,
		0,
		0
	},
	{ // Layer 3
		0,
		104, 104, 32, 3, 3, 8, 64, 64,
		0,
		104, 104, 64, 1, 1, 1, 1,
		1, 52, 52, 64, 2, 2,
		0,
		1
	},
	{ // Layer 4
		0,
		52, 52, 64, 3, 3, 8, 128, 128,
		1,
		52, 52, 128, 1, 1, 1, 1,
		1, 26, 26, 128, 2, 2,
		0,
		0
	},
	{ // Layer 5
		0,
		26, 26, 128, 3, 3, 8, 256, 256,
		0,
		26, 26, 256, 1, 1, 1, 1,
		1, 13, 13, 256, 2, 2,
		0,
		1
	},
	{ // Layer 6
		0,
		13, 13, 256, 3, 3, 8, 512, 512,
		1,
		13, 13, 512, 1, 1, 1, 1,
		1, 13, 13, 512, 2, 1,
		0,
		0
	},
	{ // Layer 7
		0,
		13, 13, 512, 3, 3, 8, 1024, 1024,
		0,
		13, 13, 1024, 1, 1, 1, 1,
		0, 13, 13, 1024, 2, 1,
		0,
		1
	},
	{ // Layer 8
		0,
		13, 13, 1024, 3, 3, 8, 1024, 1024,
		1,
		13, 13, 1024, 1, 1, 1, 1,
		0, 13, 13, 1024, 2, 1,
		0,
		0
	},
	{ // Layer 9
		0,
		13, 13, 1024, 1, 1, 8, 125, 125,
		0,
		13, 13, 125, 1, 0, 1, 0,
		0, 13, 13, 125, 2, 1,
		0,
		1
	},
};

signed char precision_config[][3] ={
	{8,  0, -4},//Layer-1
	{ 8,  0, -2},//Layer-2
	{ 8,  0, -1},//Layer-3
	{ 8, -1, -1},//Layer-4
	{ 8, -1, -1},//Layer-5
	{8, -1,  0},//Layer-6
	{8,  0,  2},//Layer-7
	{8,  2,  2},//Layer-8
	{8,  2,  2}//Layer-9
};

unsigned input_config[4] = {416, 416, 3, 1}; //original image size(dim1, dim2, dim3), batch size

unsigned output_config[3] = {13, 13, 125};//Layer-8  Note: only one result is extracted and verified

I've been getting errors about the pooling on layer 6 (Error: incorrect setting of pooling input/output size for layer-6!!!). If I disable pooling on layer 6 it start running, but then hangs while Launching kernel MemWr with local size.... I am testing this in sw_emu btw.

Here's my setup in main.cpp:

#define IMAGE_FILE_SIZE   (416*416*3)
#define WEIGHTS_FILE_SIZE 15730592
#define LAYER_NUM         9
#define CONV_NUM          9
const char *weight_file_path = "./data/yolo/weights.dat";
const char *input_file_path = "./data/yolo/dog.dat";

And here is my weights file, the caffe model, the matlab script to generate weights, as well as the input file: tiny-yolo-config.zip

@doonny Do you have any idea where I could be going wrong?

Mar 24 '18 01:03 META-DREAMER

Tiny-yolo uses SAME padding in max pool, meaning stride 1 in layer 6 will output a similiar size as input. For that, i think u have to manually add some padding. https://stackoverflow.com/a/48393040/1558037

Mar 24 '18 11:03 zhao-lun

@johnnydept Im still having troubles getting it to run. Can you share your layer_config/weights you used?

Mar 25 '18 02:03 META-DREAMER

@hammadj The thing is I uses floating point implementation, though, fixed point is my next work plan or using coco dataset. We have it running on de1soc at 8s/image

Mar 25 '18 11:03 zhao-lun

@johnnydept Do you know what could be causing a hang? Its stuck on the clWaitForEvents call on layer 1. So padding on layer 6 shouldnt even matter at this point since its only layer 1.

Mar 25 '18 22:03 META-DREAMER

Debugged a bit more, found the place where it is hanging, its in the memWrite function in conv_pipe.cl, the line that says output = read_channel_intel(pool_ch);. It hangs when (x=112, y=61) for some reason.

@aazz44ss Would you have any idea whats wrong here?

Mar 26 '18 01:03 META-DREAMER

@johnnydept @doonny Ok so I finally got tinyYOLO running, the problem was that the uchar type used in many places doesn't support values higher than 256, so I switched those out for ushort and its running now.

However, the output I get is not as expected. I've attached the result dump here: result_dump.txt.

I feel like its because I need to setup the precision config properly for tinyYOLO. Any idea on what the proper precision_config should be for tinyYOLO? Do I need to change anything with how I am converting the weights? This is my matlab script for converting weights right now:

caffe.set_mode_cpu();

model = './caffe/tiny-yolo-nobn.prototxt';
weights = './caffe/tiny-yolo-nobn.caffemodel';

net = caffe.Net(model, weights, 'test');
netparams = {{net.params('conv1',1).get_data(),net.params('conv1',2).get_data()}, ...
			{net.params('conv2',1).get_data(),net.params('conv2',2).get_data()}, ...
			{net.params('conv3',1).get_data(),net.params('conv3',2).get_data()}, ...
			{net.params('conv4',1).get_data(),net.params('conv4',2).get_data()}, ...
			{net.params('conv5',1).get_data(),net.params('conv5',2).get_data()}, ...
			{net.params('conv6',1).get_data(),net.params('conv6',2).get_data()}, ...
			{net.params('conv7',1).get_data(),net.params('conv7',2).get_data()}, ...
            {net.params('conv8',1).get_data(),net.params('conv8',2).get_data()}, ...
            {net.params('conv9',1).get_data(),net.params('conv9',2).get_data()}};


WeightWidth    = [ 8;  8;  8;  8;  8;  8;  8;  8; 8];
WeightFrac     = [ 8;  8;  8;  8;  8;  8;  8;  8; 8];

MathType   = fimath('RoundingMethod', 'Nearest', 'OverflowAction', 'Saturate', 'ProductMode', 'FullPrecision', 'SumMode', 'FullPrecision');

for i=1:9
	WeightType{i}  = numerictype('Signed',1, 'WordLength', WeightWidth(i), 'FractionLength', WeightFrac(i));
	weight{i}  = fi(netparams{i}{1}, WeightType{i}, MathType);
	bias{i}    = fi(netparams{i}{2}, WeightType{i}, MathType);
end

fid = fopen('weights.dat', 'w');
for i=1:9
    fwrite(fid, storedInteger(weight{i}), 'int8');
    fwrite(fid, storedInteger(bias{i}), 'int8');
end
fclose(fid);

Mar 28 '18 18:03 META-DREAMER

@hammadj What do you mean by "merged the batch-norm layers into the conv layers", did you write a kernel that do BN after convolution? If so, are you willing to share the code? Thank you.

Jun 02 '18 23:06 myih

Heyy @hammadj @johnnydept @doonny I am also thinking of implementing yolov2 or tiny yolo on fpga using opencl. I am thinking of using Pipecnn as reference. To do this, which files to I need to change for this to work for yolo? I need to write the kernel files, layer config file and host files according to yolo ryt ? Is that all that I need to change ?

It would be a great help if you could help me with this cause I am still new to opencl. Thanks in advance!

Nov 13 '18 05:11 Thilanka97

@hammadj What do you mean by "merged the batch-norm layers into the conv layers", did you write a kernel that do BN after convolution? If so, are you willing to share the code? Thank you.

it refers to fused batch norm layer. When training is finished , the graph is frozen you can calculate normalization mean and etc and fuse them to weights. and after that there is no need to do batch norm in your inference.

For more information you can use tf lite quantization discripsion.

Jul 22 '19 21:07 sinaasadiyan

I am working on implementing tiny-YOLO using PipeCNN, was just looking for some advice and guidance for the best way to do it and the steps I should take.

I'm going to convert the tiny-YOLO weights file from darknet -> caffe and then use MATLAB fixed point toolbox to convert that to fixed-point weights that PipeCNN will use.

For updating the layer_config, how should I do this? What exactly is the format of the layer_config?

I will also be updating main.cpp to work with webcam feed.

Is there anything else here that I missed? What else will I need to do to get tiny-YOLO running?

We recently developed a CNN accelerator for darknet reference model which could be helpful for you to implement the tiny yolo. We used DE10 Nano based on Intel Cyclone V SoC FPGA for the implementation. You can check out the entire design flow to implement the accelerator and the relevant codes in this repository: Link

Sep 17 '20 21:09 tirumalnaidu

PipeCNN PipeCNN copied to clipboard

tiny-YOLO Implementation

PipeCNN
PipeCNN copied to clipboard