ocr-pytorch
ocr-pytorch copied to clipboard
Pipeline
**Hello Everyone,
Could you please explain the following section of the code in relation to the pipeline. My question is why did you use the conv2d followed by conv1d. What are the benefits and why you consider this.**
def conv2d(in_channel, out_channel, kernel_size): layers = [ nn.Conv2d( in_channel, out_channel, kernel_size, padding=kernel_size // 2, bias=False ), nn.BatchNorm2d(out_channel), nn.ReLU(), ]
return nn.Sequential(*layers)
def conv1d(in_channel, out_channel): layers = [ nn.Conv1d(in_channel, out_channel, 1, bias=False), nn.BatchNorm1d(out_channel), nn.ReLU(), ]
return nn.Sequential(*layers)
class OCR(nn.Module): def init(self, n_class, backbone, feat_channels=[768, 1024]): super().init() I didn't see in the model how you refer resnet 101 or HRNet. self.backbone = backbone What does this feat channels and ch16, ch32 means. ch16, ch32 = feat_channels what does this mean? self.L = nn.Conv2d(ch16, n_class, 1) self.X = conv2d(ch32, 512, 3) I found in the article these phi, psi are transformation functions and were used in self-attention. Could you please explain the benefit of using these function?
self.phi = conv1d(512, 256)
self.psi = conv1d(512, 256)
self.delta = conv1d(512, 256)
self.rho = conv1d(256, 512)
self.g = conv2d(512 + 512, 512, 1)
self.out = nn.Conv2d(512, n_class, 1)
self.criterion = nn.CrossEntropyLoss(ignore_index=0)
def forward(self, input, target=None):
input_size = input.shape[2:]
stg16, stg32 = self.backbone(input)[-2:]
X = self.X(stg32)
Thanks in advance
I didn't see in the model how you refer resnet 101 or HRNet.
self.backbone = backbone
I made it that you can pass specific backbones into the model. See: https://github.com/rosinality/ocr-pytorch/blob/master/train.py#L179
What does this feat channels and ch16, ch32 means.
ch16, ch32 = feat_channels
It specifies number of channels of stride 16, stride 32 feature maps.
what does this mean?
self.L = nn.Conv2d(ch16, n_class, 1)
self.X = conv2d(ch32, 512, 3)
X corresponds to pixel representations, L corresponds to soft object regions. You can find this in the paper version 1.
phi & psi is for computing logits similar to commonly used for self attentions. One difference is activation is used for that. Maybe authors found that adding more nonlinearities is better for this.
Hello Rosinality,
Thank you for your prompt response. Could you please answer these 2 questions.
-
why did you use the conv2d followed by conv1d. What are the benefits of using conv2d and conv1d and why you consider both.
-
I have kind of futuristic question also. If I want to change the model of OCR and include attention in the model. Is that possible? Also what modification would you suggest to implement OCR.
Thank you
- I have used Conv1d as feature maps will be flattened to compute self attentions. Actually you can implement it solely using Conv2d.
- Do you want to use OCR attentions to your models? One core idea of OCR is create soft attention region maps (corresponds to each semantic class) and using it for attention. I think you need to check that it is appropriate to your tasks and models.
In my case, there will be a grayscale image showing temperature values of an object. Where I have to use semantic segmentation to delineate between defective and non-defective area in one image. Could you please suggest some ideas that you have. Thank you once again for your assistance.
You can try to use defective and non-defective as 2 classes and use OCR on it. I don't know OCR is very appropriate for your tasks, but it may be worth try if you have some baseline segmentation models.
Yeah sure. Thanks I will try. Also, based on your broad knowledge about several models. What other models would you suggest that would be appropriate to try. Or any modification in OCR models that would be poissibly helpful. Thanks for your time.
I think UNet or FPN and DeepLab v3+ will be simple but powerful baselines. If I use OCR then I will try to use more powerful backbones or use decoder like approaches (that is, for example, concatenate stride 4 features) such as DeepLab v3+
Hello Rosinality,
In table 4 in the article, you mentioned comparison with other methods. Question 1) Did you actually implement all those methods yourselves. Or you just compare with them.
I am interested in CC-Attention, Self-Attention and Double Attention.
Question 2) If you implement these three methods your selves. Could you please share your code.
Thank you
Kind regards Rahmat
Sorry, I'm not the author of the paper.
Thanks
On Tue, Feb 18, 2020 at 6:13 PM Kim Seonghyeon [email protected] wrote:
Sorry, I'm not the author of the paper.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/rosinality/ocr-pytorch/issues/3?email_source=notifications&email_token=AMWOBYWQZUHQ42OZUSFGE6LRDR2Q3A5CNFSM4KV7HRS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEMF3CMQ#issuecomment-587968818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMWOBYX3HPIPXL4AGAEJZ6TRDR2Q3ANCNFSM4KV7HRSQ .
Thanks Rosinality
Hi rosianlity, Could you please share your email.
Hi Rosinality,
Could you please explain how to apply Global Weight average pooling in pytorch. For your refernce the article which discussed is mentioned as below. Thank you
@inproceedings{kolesnikov2016seed, title={Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation}, author={Kolesnikov, Alexander and Lampert, Christoph H.}, booktitle={European Conference on Computer Vision ({ECCV})}, year={2016}, organization={Springer}
pooled = (weight * input).sum([2, 3], keepdim=True)