Learning-Action-Completeness-from-Points
Learning-Action-Completeness-from-Points copied to clipboard
about thumos14 label
Hello, in thumbos14, CliffDiving is a subclass of Diving, and the action instances of CliffDiving in the annotation file also belong to Diving. Why don't you use this prior knowledge to remove the action instance of CliffDiving class in the Diving class during training and add a Diving class for each predicted CliffDiving action instance during post-processing? I think an action instance belonging to two categories may make the training difficult to converge.
Thanks for your suggestion!
In fact, I have noticed some papers on fully-supervised temporal action localization that use such a label engineering technique.
However, to my knowledge, existing weakly-supervised approaches do not use it.
Therefore, we did not adopt it for a fair comparison with the previous works, although it may bring some performance gains.
谢谢你的建议!
事实上,我注意到一些关于全监督时间动作定位的论文使用了这种标签工程技术。
然而,据我所知,现有的弱监督方法并没有使用它。
因此,我们没有采用它来与之前的作品进行公平的比较,尽管它可能会带来一些性能提升。
Thanks for your reply. For the point annotation of Thumos14, SFnet provides four annotation files. Are these four files manually annotated? Is the Thumos 14 point annotation uniformly sampled from the ground truth mentioned in your paper generated by yourself or provided by other papers?
As I have stated in the paper, we used the automatically generated point-level labels that are provided by Moltisanti et al. (CVPR'19).
The point-level labels can be found on their project page, specifically the 'train_df_ts_in_gt.csv' file.
As I have stated in the paper, we used the automatically generated point-level labels that are provided by Moltisanti et al. (CVPR'19).
The point-level labels can be found on their project page, specifically the 'train_df_ts_in_gt.csv' file.
In the paper,you perform experiments about comparing different label distributions:Manual,Uniform and Gaussian. Where did you get the manual and uniform label?
The manual labels are provided by SF-Net, while the Uniform-distributed labels are generated using ground-truth intervals in the dataset construction stage before the training starts.
The manual labels are provided by SF-Net, while the Uniform-distributed labels are generated using ground-truth intervals in the dataset construction stage before the training starts.
I see the SF-Net,but it provides the four single-frame text files. Are these four files manually annotated?Do you use one of the txt files?
All the four files contain manually labeled annotations from different annotators. For selection, we followed SF-Net official code that randomly chooses the annotator id for each video in the dataset construction stage.
All the four files contain manually labeled annotations from different annotators. For selection, we followed SF-Net official code that randomly chooses the annotator id for each video in the dataset construction stage.
Thanks for your reply. Have you noticed that there is an annotation file THUMOS2.txt that the video of CliffDiving class is not marked with its parent class Diving? But in other annotation files, the videos of CliffDiving class still belong to the parent class Diving.
I am not sure whether there are any papers that reduce the Cliffdiving class to the Diving class. An example of the opposite case is the WTAL-C codebase, which is widely used as the baselines for many other works. You may check out how others do by navigating their code links here.
I am not sure whether there are any papers that reduce the Cliffdiving class to the Diving class. An example of the opposite case is the WTAL-C codebase, which is widely used as the baselines for many other works. You may check out how others do by navigating their code links here. Hi,I find the split_test.txt that you provide lacks three videos,for example video_test_0000270.I want to know the reason.
I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN.
In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation.
I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN.
In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation. Thank you very much.
I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN. In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation. Thank you very much.
Hello, I'm sorry to bother you. I am a beginner and would like to ask you why some fully supervised methods, such as actionformer, use feature lengths that are inconsistent with the feature lengths you provide. Is it because i3d uses different sampling rates when extracting features?
I followed the implementation of STPN, where it is mentioned that the test split of THUMOS'14 is the same with SSN. In the SSN paper, the authors mentioned that "2 falsely annotated videos (“270”,“1496”) in the test set are excluded in evaluation" and they used only 210 testing videos for evaluation. Thank you very much.
Hello, I'm sorry to bother you. I am a beginner and would like to ask you why some fully supervised methods, such as actionformer, use feature lengths that are inconsistent with the feature lengths you provide. Is it because i3d uses different sampling rates when extracting features?
The feature lengths depend on the sampling rate and the total number of frames. Actionformer adopts a smaller stride of 4 (vs. 16 for ours) with a video fps of 30 (vs. 25 for ours).
Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply.
Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply.
The segment, within which the labeled point (frame) falls, is utilized as a positive sample for the point-level loss.
Hello, I would like to ask that the point label is frame-level, and the video is divided into 16 frames. So how to apply this point-level classification loss? One is a frame and the other is a segment. Looking forward to your reply.
The segment, within which the labeled point (frame) falls, is utilized as a positive sample for the point-level loss.
Thank you for your reply, and I wish you a happy life!