escnn icon indicating copy to clipboard operation
escnn copied to clipboard

Model Initialization Extremely slow

Open dc250601 opened this issue 1 year ago • 2 comments

Is there a way to speed up the model initialisation process? Every time I initialize the model, it takes over 30 minutes to initialize the model before the training starts.

dc250601 avatar May 21 '23 13:05 dc250601

Hi @dc250601

Unfortunately, this can happen for wide models. This is due to the slow computation of the variance needed for He weight initialization here.

To speed this up, these variances can be cached such that following layers using the same basisexpansion / basissampler / basismanager will not need to recompute them. This also helps if you train your model multiple times in a row (only the first time these variances need to be computed).

The R3Conv (and R2Conv) constructor calls this method with cached=False by default, so no caching is performed. However, you can set initialize=False to avoid initialization entirely and then manually use generalized_he_init with cached=True.

Alternatively, you could also try to use the delta-orthogonal initialization, which I think is a bit faster. As earlier, you'll have to disable the automatic initialization within the conv layers by using initialize=False and then manually call this initialization method.

Let me know if these solutions work for you!

Best, Gabriele

Gabri95 avatar Jun 22 '23 16:06 Gabri95

So the He initialization is why things are slow if I have a convolution with lots of input channels and output channels? And I suppose that would be true even if I have many duplicates of the same "kind" of channels (i.e. 128 irrep(5) channels as input and 128 irrep(5) channels as output)?

Or is there some other kind of "width" that would explain the slowness? Like maybe by width do you mean kernel size? Basically my question is: what do you mean by "wide" model?

jacksonloper avatar Dec 03 '23 02:12 jacksonloper