iPerceive icon indicating copy to clipboard operation
iPerceive copied to clipboard

Some doubts about your code

Open Linxxx opened this issue 4 years ago • 9 comments

Your research is nice. As for your code, are you sure your code is runnable? I have tried to run your code about iPerceiveDVC but there are lots of low-level bugs. Please explain about them. Thx.

`class EncoderLayer(nn.Module):

def __init__(self, d_model, dout_p, H, d_ff):
    super(EncoderLayer, self).__init__()
    self.res_layers = clone(ResidualConnection(d_model, dout_p), 2)
    # Discard encoder's self-multiheaded attention module in place for common-sense features
    # self.self_att = MultiheadedAttention(d_model, H)
    self.feed_forward = PositionwiseFeedForward(d_model, d_ff)
def forward(self, x, src_mask): # x - (B, seq_len, d_model) src_mask (B, 1, S)
    # sublayer should be a function which inputs x and outputs transformation
    # thus, lambda is used instead of just `self.self_att(x, x, x)` which outputs 
    # the output of the self attention
    sublayer0 = lambda x: self.self_att(x, x, x, src_mask)
    sublayer1 = self.feed_forward
    x = self.res_layers[0](x, sublayer0)
    x = self.res_layers[1](x, sublayer1)
    return x # x - (B, seq_len, d_model)`

`def forward(self, pred, target): # pred (B, S, V), target (B, S) # Note: preds are expected to be after log B, S, V = pred.shape # (B, S, V) -> (B * S, V); (B, S) -> (B * S) pred = pred.contiguous().view(-1, V) target = target.contiguous().view(-1)

    dist = self.smoothing * torch.ones_like(pred) / (V - 2)
    # add smoothed ground-truth to prior (args: dim, index, src (value))
    dist.scatter_(1, target.unsqueeze(-1).long(), 1-self.smoothing)
    # make the padding token to have zero probability
    dist[:, self.pad_idx] = 0
    # ?? mask: 1 if target == pad_idx; 0 otherwise
    mask = torch.nonzero(target == self.pad_idx)
    if mask.sum() > 0 and len(mask) > 0:
        # dim, index, val
        dist.index_fill_(0, mask.squeeze(), 0)
    #return F.kl_div(pred, dist, reduction='sum')

    return F.kl_div(pred, dist, reduction='sum') + self.bce_loss + self.reg_loss + self.l2_loss`

Linxxx avatar Jan 15 '21 03:01 Linxxx

I've uncommented the line that defines self_att. I'm not sure what you're trying to highlight with the latter part of the code snippet though.

amanchadha avatar Jan 15 '21 06:01 amanchadha

The following code block will cause a problem that "unsupported operand type(s) for +: 'Tensor' and 'BCEWithLogitsLoss'". It seems that this line " return F.kl_div(pred, dist, reduction='sum') + self.bce_loss + self.reg_loss + self.l2_loss" has some problems. ` def forward(self, pred, target): # pred (B, S, V), target (B, S) B, S, V = pred.shape pred = pred.contiguous().view(-1, V) target = target.contiguous().view(-1)

dist = self.smoothing * torch.ones_like(pred) / (V - 2)
# add smoothed ground-truth to prior (args: dim, index, src (value))
dist.scatter_(1, target.unsqueeze(-1).long(), 1-self.smoothing)
# make the padding token to have zero probability
dist[:, self.pad_idx] = 0
# ?? mask: 1 if target == pad_idx; 0 otherwise
mask = torch.nonzero(target == self.pad_idx)

if mask.sum() > 0 and len(mask) > 0:
    # dim, index, val
    dist.index_fill_(0, mask.squeeze(), 0)
#return F.kl_div(pred, dist, reduction='sum')

return F.kl_div(pred, dist, reduction='sum') + self.bce_loss + self.reg_loss + self.l2_loss


Linxxx avatar Jan 15 '21 09:01 Linxxx

Hi @Linxxx Have you used csfeatures? I want to ask some questions about csfeatures.

BNU-Wu avatar Jan 19 '21 14:01 BNU-Wu

@BNU-Wu Yes, I was trying to use csfeatures but I failed, some bugs occured.

Linxxx avatar Jan 20 '21 10:01 Linxxx

你是否面临了这些问题?Regarding csfeatures, I have two questions for you. Questions 1:I run your iPerceive and get that there should be N 1024 features in each frame (N represents the detected object). In what way did you merge them into a 1024 feature to represent this frame? Questions 2: csfeatures is combined with video_stack_rgb through the hstack function. When running the code, an error will be reported. The reason is The reason is that the merged video_stack_rgb is 2048, and video_stack_flow is 1024. So they cannot be merged. How did you solve this problem? If you can provide the csfeatures file or have a solution, I will be very grateful for your help.

BNU-Wu avatar Jan 20 '21 10:01 BNU-Wu

@BNU-Wu 我遇到过第二个问题。这个feature融不进去,在loss部分的直接相加是有问题的 “ return F.kl_div(pred, dist, reduction='sum') + self.bce_loss + self.reg_loss + self.l2_loss”

Linxxx avatar Jan 20 '21 11:01 Linxxx

我想请问一下第一个问题你跑完iPerceive后,得到的每帧是不是有多个对象特征?你是怎么融合到一个1024特征的呢?对于第二个问题通过hstack去融合CS特征和I3D特征后,它已经变成2048了,怎么跟video_stack_flow 进行融合了?

BNU-Wu avatar Jan 20 '21 11:01 BNU-Wu

@BNU-Wu , for the first part I added the multiple object features together to just get a single feature representation but I didn't get good results with that. I don't know whether this is the right approach. I also need answer to this question.

siyamsajeebkhan avatar Feb 12 '21 11:02 siyamsajeebkhan

@BNU-Wu , for the first part I added the multiple object features together to just get a single feature representation but I didn't get good results with that. I don't know whether this is the right approach. I also need answer to this question.

Hi @siyamsajeebkhan I took the same method as you, and the result was not good. I don't know what the problem is.

BNU-Wu avatar Feb 23 '21 01:02 BNU-Wu