Jittor-MLP icon indicating copy to clipboard operation
Jittor-MLP copied to clipboard

question

Open 123456789asdfjkl opened this issue 2 years ago • 3 comments

Thank you for coding this! I want to know why this code is written like this and what the padding operation acts on. https://github.com/liuruiyang98/Jittor-MLP/blob/b86656b65cf5f18ba9eb760d1f7565ed95e7e96e/models_pytorch/morph_mlp.py#L23

123456789asdfjkl avatar Jun 22 '22 07:06 123456789asdfjkl

Thanks for your interest.

As of now, MorphMLP is not yet open source officially. I implemented it based on Figure 3 and Figure 4 in the paper with my self-understanding.

As for the padding operation, the code in lines 51-54 is to handle the case where L is not divisible by H or W. This part of the detail is not specifically discussed in the paper and I added it myself.

I will keep an eye on the open source status of MorphMLP, and I will update it to the official version to ensure it correct.

P_l, P_r, P_t, P_b = (self.L - W % self.L) // 2, (self.L - W % self.L) - (self.L - W % self.L) // 2, (self.L - H % self.L) // 2, (self.L - H % self.L) - (self.L - H % self.L) // 2.

liuruiyang98 avatar Jun 22 '22 08:06 liuruiyang98

Thank you for your answer. My understanding of L is to flatten the input by row or column and then obtain the chunks with L size, don't you think?

We set chunk length to L and thus obtain X_i ∈ R^{L×C}, where i ∈ {1, ..., HW/L}.

123456789asdfjkl avatar Jun 22 '22 08:06 123456789asdfjkl

Yes, you are right. When I read this article at that time, I felt very confused here. From Figure 3, L is the group size in the H and W directions, but from table 7 and the network configuration, how to achieve L=49 in stage 4? This can only be completely flattened the feature map (7x7).

Maybe the meaning of L can be clearly understood by representing the feature map as (B, HW, c) in MLPMixer instead of (B, C, H, W) in Figure 3. 😂

liuruiyang98 avatar Jun 22 '22 08:06 liuruiyang98