siren icon indicating copy to clipboard operation
siren copied to clipboard

Possible mismatch between supplementary section 1.5 and the implementation?

Open crysoberil opened this issue 4 years ago • 6 comments

The initialization of the first layer as in here:

https://github.com/vsitzmann/siren/blob/ecd150f99b40217d76e0f15753b856aa2d966ab1/modules.py#L629-L634

This does not apply a square root over the fan in of the layer. Am I missing something in the paper?

crysoberil avatar Jun 26 '20 01:06 crysoberil

Edit: I'll leave this issue open so other people can see it.

No, we actually got this wrong in the paper - the implementation is correct, this magnitude of weights in the first layer is appropriate for images, the one in the paper is too large! We'll fix it in the next version. In general, the initialization of the first layer is dependent on the frequencies of the signal - higher frequencies require larger weights in the first layer. See, for instance, the audio section in the Colab, where we set omega_0 to 3000!

vsitzmann avatar Jun 26 '20 02:06 vsitzmann

In general, the initialization of the first layer is dependent on the frequencies of the signal

The highest frequency in the signal is related to its sampling resolution. Ideally you'd have to invoke Shannon's theorem to determine the appropriate frequency range.

You shouldn't pick omega_0 from an heuristic.

grondilu avatar Jun 26 '20 11:06 grondilu

From Supplemental section 1.5:

This keeps the distribution of activations constant, but boosts gradients to the weight matrix, W, by the factor, ω0, while leaving gradients w.r.t. the input of the sine neuron unchanged.

Does scaling up the learning rate accomplish the same thing? If not, what's the difference between these two hyperparameters?

Experimentally, the losses follow a similar trajectory when scaling up either omega_0 or scaling up the learning rate by the same factor (in the latter case, I removed omega_0 scaling from everywhere except the first layer's activations). As well, the visual outputs show similar progress. I used the cameraman ImageFitting training procedure from the linked Colab notebook.

dcato98 avatar Jun 27 '20 14:06 dcato98

The highest frequency in the signal is related to its sampling resolution.

This statement is incorrect. The highest possible frequency that is not aliased in the sampled signal is the Nyquist frequency. It is not the maximum frequency that is present in the underlying, ground-truth signal. You could also have a low frequency sine wave that is sampled at very high resolution, in which case you want the initialization to reflect the intrinsic low frequency, instead of the Nyquist frequency.

If we applied your principle, then superresolution as an application would be impossible. It may very well be that it is attractive to learn a prior over frequencies that are higher than the Nyquist frequency in order to perform, for instance, superresolution.

This is also related to the idea to get rid of discrete grids - we want to match the intrinsic spectrum of the signal, not the the spectrum of the sampled signal.

vsitzmann avatar Jun 27 '20 19:06 vsitzmann

If I use this sin activate in CNN to some simple task, such as Image Classification. I just follow the initialization in your code not the paper right?I was confused about it......

MARD1NO avatar Jun 30 '20 11:06 MARD1NO

If I use this sin activate in CNN to some simple task, such as Image Classification. I just follow the initialization in your code not the paper right?I was confused about it......

Same question here. Have you tried that?

ZhengdiYu avatar Jun 30 '21 02:06 ZhengdiYu