SimOn
SimOn copied to clipboard
Various questions
Hi there, I've been reading the paper and checking your code, and I tried to apply it to another custom dataset for OnTAL. I have a few questions regarding the code:
- in the constructor of TRNTHUMOSDataLayer, you have
target = target_all[session]["anno"]
# pad feature and target
if len(video_feature) % self.stride != 0:
tmp_data = np.zeros((self.stride - len(video_feature) % self.stride, args.dim_feature))
video_feature = np.concatenate((video_feature, tmp_data), axis=0)
# extend gt annotation
tmp_data_anno = np.full((self.stride - len(video_feature) % self.stride, target.shape[-1]), -1)
target = np.concatenate((target, tmp_data_anno), axis=0)
So, you create tmp_data and concatenate it to video_feature, so that video_feature.shape[0] is a multiple of self.stride. However, immediately below you're doing the same for target:
tmp_data_anno = np.full((self.stride - len(video_feature) % self.stride, target.shape[-1]), -1)
but at this point, len(video_feature) % self.stride = 0, because you just concatenated it with tmp_data. Shouldn't you save the original len of video_feature, and use that one instead?
original_len = len(video_feature)
if original_len % self.stride != 0:
tmp_data = np.zeros((self.stride - original_len % self.stride, args.dim_feature))
video_feature = np.concatenate((video_feature, tmp_data), axis=0)
# extend gt annotation
tmp_data_anno = np.full((self.stride - original_len, target.shape[-1]), -1)
target = np.concatenate((target, tmp_data_anno), axis=0)
- In the same function, you are using:
num_windows = int((current_video_num_feature_frames + self.stride - self.num_fframes_in_window) / self.stride)
can't you just do:
num_windows =video_features.shape[0] / self.stride
can't you just do:
num_windows =video_features.shape[0] / self.stride
- in the function utils.frame_level_map_n_cap, you do:
this_cls_prob = all_probs[i, :]
this_cls_gt = all_labels[i, :]
w = np.sum(this_cls_gt == 0) / np.sum(this_cls_gt == 1)
indices = np.argsort(-this_cls_prob)
tp, psum, cpsum = 0, 0.0, 0.0
for k, idx in enumerate(indices):
if this_cls_gt[idx] == 1:
tp += 1
wtp = w * tp
fp = (k + 1) - tp
psum += tp / (tp + fp)
cpsum += wtp / (wtp + fp)
I don't understand why you consider the entry a TP only by checking if the corresponding GT value is equal to 1, it's like you're expecting all the elements in this_cls_prob to be positives.
I thought this_cls_prob was an array of probabilities that that specific snippet belongs to a class, so that you could have something like (assume using a threshold of 0.7):
-------------------------------------
idx | 0 | 1 | 2 | 3 | ...
-------------------------------------
pred_prob | 0.9 | 0.8 | 0.7 | 0.4 | ...
frame_gt | 1 | 0 | 1 | 1 | ...
-------------------------------------
res | TP | FP | TP | FN | ...
-------------------------------------
And in this case, I would expect you should use something like:
if this_cls_prob > threshold and this_cls_gt[idx] == 1:
# this is a TP
if this_cls_prob < threshold and this_cls_gt[idx] == 0:
# this is a TN
and so on, but maybe I just misinterpreted the meaning of the arrays?