diffpool batch normalization

batch normalization

Open szhang0112 opened this issue 5 years ago • 2 comments

I am a little bit confused by the batch normalization implementation here: everytime I run bn a self.bn(x) is used, where a new bn layer is created, i.e., bn_module = nn.BatchNorm1d(x.size()[1]).cuda(). Will this fail to train the parameters in the bn as it is created incrementally?

Jun 10 '19 20:06 szhang0112

I also feel confused about this... It seems that as x.size(1) (that is, maximum graph size of a batch) varies from time to time, the size of BatchNorm cannot be fixed. However, I still think this usage of BatchNorm problematic. Hoping to get more explanation about this!

Jul 01 '19 08:07 shirley-wu

Hi,

Thanks for pointing out. I think batch norm is quite confusing for GNNs and what you said make sense. I pushed the new version with bn having trainable parameters registered. It is confusing not just because of the size of graph, but also that I'm not sure if 2d batch norm or 1d make more sense, but there's also the problem that there is no alignment between nodes of different graphs, and we cannot do normalization along the axis of nodes. Performance-wise I don't see a difference. I'm still working on improving batch norm in general for GNNs.

Thanks again for raising this issue!

Rex

Jul 24 '19 23:07 RexYing

diffpool diffpool copied to clipboard

batch normalization

diffpool
diffpool copied to clipboard