heterocl
heterocl copied to clipboard
No reuse dimension found in the body for tensor input
I am trying to reuse the input image in a conv2d layer in the LeNet example. The reuse_at primitive works fine with the placeholder inputs (i.e. input_image in the first conv2d). However when passing the max-pooled result to the second conv2d layer, no reuse pattern was found for it.
import heterocl as hcl
import hlib
import numpy as np
batch_size = 1000
qtype1 = hcl.Fixed(16, 14)
qtype2 = hcl.Fixed(16, 14)
def build_lenet(input_image, weight_conv1, weight_conv2,
weight_fc1, weight_fc2, lenet):
# first conv
conv1 = hlib.nn.conv2d_nchw(input_image, weight_conv1, "conv1")
tanh1 = hlib.nn.tanh(conv1, "tanh1")
pool1 = hlib.nn.max_pool(tanh1, kernel=(2,2), stride=(2,2), name="pool1")
# second conv
conv2 = hlib.nn.conv2d_nchw(pool1, weight_conv2, name="conv2")
tanh2 = hlib.nn.tanh(conv2, "tanh2")
pool2 = hlib.nn.max_pool(tanh2, kernel=(2,2), stride=(2,2))
# first fc
flat = hlib.nn.flatten(pool2)
fc1 = hlib.nn.dense(flat, weight_fc1)
tanh3 = hlib.nn.tanh(fc1, "tanh3")
# second fc
fc2 = hlib.nn.dense(tanh3, weight_fc2)
# loss
return hlib.nn.softmax(lenet, fc2)
input_image = hcl.placeholder((batch_size, 1, 28, 28), "input_image")
weight_conv1 = hcl.placeholder((20, 1, 5, 5), "weight_conv1", qtype1)
weight_conv2 = hcl.placeholder((50, 20, 5, 5), "weight_conv2", qtype1)
weight_fc1 = hcl.placeholder((500, 800), "weight_fc1", qtype1)
weight_fc2 = hcl.placeholder((10, 500), "weight_fc2", qtype1)
lenet = hcl.placeholder((batch_size, 10), "lenet")
s = hcl.create_schedule([input_image, weight_conv1, weight_conv2,
weight_fc1, weight_fc2, lenet], build_lenet)
s[build_lenet.conv1].compute_at(s[build_lenet.tanh1], build_lenet.tanh1.axis[3])
s.reuse_at(input_image, s[build_lenet.conv1], build_lenet.conv1.axis[0])
s.reuse_at(build_lenet.pool1._op, s[build_lenet.conv2], build_lenet.conv2.axis[1])
print(hcl.lower(s))
The error message as followed:
check_call
raise TVMError(py_str(_LIB.TVMGetLastError()))
heterocl.tvm._ffi.base.TVMError: [14:12:42] src/pass/generate_reuse_buffer.cc:245: No reuse is found in axis nn
I think the problem here is the axis. According to the error message, it seems like the first reuse_at is incorrect. There is no reuse across the 0th dimension (i.e., the batches) and this makes sense.
That makes sense. Actually the reuse_at won't error out with placeholder input (ie. first primitive) even if it is required ti find reuse pattern at batch level. The error message comes from the second resue_at primitive when the input is a tensor. They both work at the height or width level.
I suggest we leave this issue open so we know we are missing support for non-unit stride stencil. We should also document all the existing limitations for each customization primitives.
Also, do we report error message for reuse_at() where there is no reuse opportunities?
My previous answer was wrong so I deleted it. This issue is exactly caused by no reuse opportunities instead of non-unit stride. For the limitation, it has been already documented in our online documentation. You can see it here.
Good to know. But is the compiler spitting proper error when reuse_at does not apply? Also, is there a fundamental challenge that prevents us from supporting non-unit stride?
For the first question, as you can see from the error message in the first post, it clearly specifies that axis nn does not have reuse opportunities. For other types of limitation, the compiler will spit out different messages. For the second question, the answer is no. We just need more engineering effort.
Also, another limitation. reuse_at does not take effect when combined with compute_at primitive. For example, with the following snippet.
s[conv2].compute_at(s[tanh2], tanh2.axis[3]) # combine CONV with tanh
s.reuse_at(pool1._op, s[conv2], conv2.axis[2]) # linebuffer at index y
For here I want to combine the conv2d stage conv2 into activation stage tanh2 with compute_at, and then reuse the max-pooled input from last stage (i.e., pool1 max-pooled from conv1). The IR is not as expected (reuse buffer was allocated but not implemented, and no error message thrown out):
// attr [pool1.reuse] storage_scope = "global"
allocate pool1.reuse[int32 * 1]
// attr [tanh2] storage_scope = "global"
allocate tanh2[int32 * 1000 * 50 * 8 * 8]
produce tanh2 {
// attr [0] extern_scope = 0
for "app_name"="tanh" (args, 0, 1000) {
for (args0, 0, 50) {
for (args1, 0, 8) {
for (args2, 0, 8) {
// attr [conv2] storage_scope = "global"
allocate conv2[int32 * 1 * 1 * 1 * 1]
produce conv2 {
// attr [0] extern_scope = 0
// attr [reducer2] storage_scope = "global"
allocate reducer2[float32 * 1]
produce reducer2 {
// attr [0] extern_scope = 0
reducer2[0] = 0.000000f
}
for (ra5, 0, 20) {
for (ra6, 0, 5) {
for (ra7, 0, 5) {
reducer2[0] = (float32((int48(pool1[((((args2 + ra7) + ((args1 + ra6)*12)) + (ra5*144)) + (args*2880))])*fixed48_14(weight_conv2[(((ra7 + (ra6*5)) + (ra5*25)) + (args0*500))]))) + reducer2[0])
}
}
}
conv2[0] = int32(reducer2[0])
}
tanh2[(((args2 + (args1*8)) + (args0*64)) + (args*3200))] = int32(tanh(float64(conv2[0])))
}
}
}
}
}
If I use reuse_at and than compute_at. The program will crash with SegFault.