SLF-RPM
SLF-RPM copied to clipboard
Question about Sparsity-Based Temporal Augmentation
Hi👋 , great job, yet I still got a question about the Sparsity-Based Temporal Augmentation. In the paper you mentioned "each augmented clip was con- strained to have length of 30-frame. Therefore, the longest clip (i.e., stride of 5) contained 5-second information, while the shortest clip (i.e., stride of 1) had 1-second information" in section 4 Experiment Setup. However, as far as I know, given two fix-length video clips(30-frame here), if they have different fps, then they ought to represent different heart rate(the larger the fps is, the lower the heart rate. the smaller the fps is, the larger the heart rate). The Sparsity-Based Temporal Augmentation used in the paper will generate two 30-frame clips with different HRs from the exact one source video. Not only the HRs is different, so do other physiological signals I suppose. Since the two fix-length video clips have different physiological signals, how can they be trained with contrastive loss? I mean, intuitively, the contrastive loss used in paper ought to help model learn the physiological signal invariance from two views of the same video and learn the difference of physiological signals from different video. It's a bit confusing, I would be very appreciate if you could answer the questions, many thanks.