Results 11 comments of ChenyangSi

@snowcement Yes, the outputs of Conv, maxpool and attention have different channel dimensions. We first do FFT with fft2() for each channel, then do a channel-wise average pooling.