gudrb

Results 11 comments of gudrb

Thank you for answering, I am not using class token, but still i tried to use tried to use skip=1 option, and it gives the key error when i load...

Now, it is working. I modified the code for the MiniAttention class from (https://github.com/microsoft/Cream/blob/4a13c4091e78f9abd2160e7e01c02e48c1cf8fb9/MiniViT/Mini-DeiT/mini_vision_transformer.py#L97) to manually process my patch sequence (9 x 2) from a spectrogram image. # image relative...

Do I need to crop or interpolate pretrained relative positional encoding parameters when the sequence length is changed? When I use the pretrained Mini-DeiT with positional encodings (both absolute and...

On the MiniViT paper, We make several modifications on DeiT: First, we remove the [class] token. The model is attached with a global average pooling layer and a fully-connected layer...

Hello, I have a question regarding the implementation of layer normalization in the MiniViT paper and the corresponding code. Specifically, I am referring to how layer normalization is applied between...

Hello, Thank you for your kind reply. I noticed that Relative Position Encoding (RPE) is applied only on the key value. In the MiniViT paper, I couldn't see the explicit...

![20240702_164802](https://github.com/microsoft/Cream/assets/37917310/80ab7832-5fe1-4479-bcd4-a79e2691f66b) In the equations provided in the MiniViT paper, is `K_m^T` actually representing `(K'_m + r_m)^T`, where `r` are trainable positional identifiers? In the code, iRPE is used, but the...

i am using my own training code but following the ast_models.py code to define ASTModel. i can see from the memory_profiler, at line 176 of screenshot, my htop cpu memory...

I also see this memory leak from your ESC-50 training code also. but, not as much as mine. I think the difference from mine is sequence length. I use original...

Thank you for answering my questions. It was not any problem of the timm library or the AST codes. While training, there was a tensor operation without using .item() such...