TimeDRL icon indicating copy to clipboard operation
TimeDRL copied to clipboard

Could the authors tell me how to apply the TimeDRL method to ResNet structure?

Open fuen1590 opened this issue 1 year ago • 6 comments

I have no idea how to apply TimeDRL to ResNet structure as you said in your paper, because there is no Patch and CLS token. Thanks!

fuen1590 avatar May 16 '24 12:05 fuen1590

As long as the encoder has the same shape of input and output (B, T, C), you can use any model architecture you want. Also, you can switch to other encoder architecture using the code here: https://github.com/blacksnail789521/TimeDRL/blob/master/models/_load_encoder.py#L101

blacksnail789521 avatar May 16 '24 12:05 blacksnail789521

Thank you for the author's reply, but I still have some questions. For example, if there is no distinction between Instance Features and Patch Features in the feature maps generated by ResNet, then TimeDRL cannot carry out targeted self supervision tasks for both. In this case, applying TimeDRL to ResNet is treating one feature map as an Instance Feature and the rest as Patch Features?

fuen1590 avatar May 16 '24 12:05 fuen1590

What do you mean by the instance features and the patch features?

blacksnail789521 avatar May 16 '24 12:05 blacksnail789521

Sorry, instance features means "instance-level embeddings" and patch features means "timestamp-level embeddings" as in your paper.

fuen1590 avatar May 16 '24 13:05 fuen1590

if there is no distinction between Instance Features and Patch Features in the feature maps generated by ResNet, then TimeDRL cannot carry out targeted self supervision tasks for both.

Despite the encoder architecture, the [CLS] token's corresponding embedding is always the instance-level embedding, while the rest are always the timestamp-level embeddings (or patch-level embeddings, since we are currently using patches). Since the [CLS] token is at the beginning, if we have T_p patches, considering the [CLS] token, we have 1 + T_p patches as the input. Consequently, for the output, we also have 1 + T_p embeddings: the first one is the instance-level embedding, and the rest are the timestamp-level embeddings. As you can see, all these concepts are irrelevant to the encoder's architecture.

blacksnail789521 avatar May 16 '24 13:05 blacksnail789521

Okay. I understand. Thanks very much!

fuen1590 avatar May 16 '24 13:05 fuen1590