diffpool
diffpool copied to clipboard
batch normalization
I am a little bit confused by the batch normalization implementation here: everytime I run bn a self.bn(x)
is used, where a new bn layer is created, i.e., bn_module = nn.BatchNorm1d(x.size()[1]).cuda()
. Will this fail to train the parameters in the bn as it is created incrementally?
I also feel confused about this... It seems that as x.size(1) (that is, maximum graph size of a batch) varies from time to time, the size of BatchNorm cannot be fixed. However, I still think this usage of BatchNorm problematic. Hoping to get more explanation about this!
Hi,
Thanks for pointing out. I think batch norm is quite confusing for GNNs and what you said make sense. I pushed the new version with bn having trainable parameters registered. It is confusing not just because of the size of graph, but also that I'm not sure if 2d batch norm or 1d make more sense, but there's also the problem that there is no alignment between nodes of different graphs, and we cannot do normalization along the axis of nodes. Performance-wise I don't see a difference. I'm still working on improving batch norm in general for GNNs.
Thanks again for raising this issue!
Rex