sru icon indicating copy to clipboard operation
sru copied to clipboard

Zoneout support

Open bratao opened this issue 5 years ago • 2 comments

Hello,

This library is vital to our pipeline, we got a great speedup and performance improvement compared to LSTM. Thanks @taolei87 and asappresearch team.

One thing that I was experimenting with the also great haste library (https://github.com/lmnt-com/haste) was the Zoneout (https://arxiv.org/abs/1606.01305) support. It really improved all our metrics compared to regular dropout.

Is this something that could make into SRU?

bratao avatar Oct 09 '20 17:10 bratao

Hi @bratao ,

Thank you and it's great to hear SRU is useful for you.

Could you please share more details, such as:

  1. how much of an improvement did you observe using zoneout? for what NLP task(s)?
  2. did you use a binary mask or float mask for the zoneout implementation (new_c[t] = c[t]mask * c[t-1](1-mask))?
  3. is the shape of the mask per layer (length, batch_size) or (length, batch_size, hidden_size)? i.e. does a mask value apply to an entire vector or a dimension of the vector?

taoleicn avatar Oct 09 '20 18:10 taoleicn

Hi @taolei87 ,

  1. I do super-long, almost infinite sequence labeling on legal documents. Almost all RNN networks are too sensitive to overfitting as my training set is small. On my task, the best result using SRU after a exhaustive grid search on the hyperparams is a f-1 of 0.97. Pytorch LSTM get an f-1 of 0.97 too, but it is slower. Using haste LSTM with zoneout I can get a f-1 of 0.98(0.97 without).

2-3. I do not know, I just used what haste did.

bratao avatar Oct 09 '20 20:10 bratao