thinc icon indicating copy to clipboard operation
thinc copied to clipboard

Implementing Leaky Relu, parametric and other forms of Relu

Open naveenjafer opened this issue 4 years ago • 4 comments

I am working on an implementation of LeakyRelu. I would like some input on how to go about with the implementation of the same. There are 2 options.

  1. A separate layer named LeakyRelu, ParamRelu etc for each of the relu variations.
  2. One single Relu layer that instead takes optional params and implements them. (This would greatly reduce duplicated code, but also does reduce the visibility of the layer to end users unless they spend some time on the documentation).

Keras and Pytorch seem to have separate layers for each of the Relu variations, but I am inclined more towards a single relu with the right parameters. What would you guys suggest?

naveenjafer avatar Mar 10 '20 08:03 naveenjafer

Thanks for the question, I think it's definitely something to think about.

Currently in Thinc we use a single layer definition for the weights and the activation. This helps to set the initialization defaults a little bit smarter, because the choice of activation typically impacts the best initialization strategy. This does make it awkward to keep accumulating these activation variants though.

  • How many variants would we want?
  • Are they all important to have, or are some strictly inferior?
  • Do people mostly think of them as the same activation (relu), or do they think of it as a different thing?

Another awkward problem with putting it all in the Relu layer is defaults. Presumably if people do use LeakyRelu they mostly use the same leak parameter, right? We can't have a helpful default for that if we instead default that parameter to 0. And I don't want to have both a flag and a separate parameter for the leak.

honnibal avatar Mar 11 '20 10:03 honnibal

@honnibal The regular Relu is only a special case of the Leaky Relu where the alpha parameter is 0. So what I have done as of now, is kept the default as 0. When they do need a leakyRely, what they do is
Relu(alphaLeaky=0.1)

But again, this might bloat up or conflict if in the future someone would like to implement the other Relus or some future variations to come for that matter. https://keras.io/layers/advanced-activations/

Yeah keeping both the flag and Param is a very terrible idea, in some cases we can do away with an explicit flag and use the params to make an inference, but again, things might conflict in the future.

naveenjafer avatar Mar 11 '20 12:03 naveenjafer

Hi @honnibal any update on this? Would love to complete this with all the extra time the lockdowns are giving us.

naveenjafer avatar Mar 28 '20 14:03 naveenjafer

Hey @naveenjafer,

We have not implemented parametric ReLU functions, but added a bunch of activations since:

  1. Swish
  2. Gelu
  3. Dish (this is our custom more efficient Swish using sqrt instead of exp)
  4. HardSwish
  5. HardSwishMobilenet
  6. HardSigmoid
  7. HardTanh

kadarakos avatar Aug 19 '22 14:08 kadarakos