ludwig
ludwig copied to clipboard
Consolidate all dropout parameters into a single parameter applied globally across the whole model
Currently, Ludwig defines separate dropout parameters for different components within a compositional module. For example, the StackedCNNRNN module takes 4 different dropout parameters:
- conv_dropout: Dropout rate for the convolutional layers
- recurrent_dropout: Dropout rate for the recurrent layers
- dropout: Dropout rate for sequence embeddings
- fc_dropout: Dropout rate for the final fully connected output layer.
While this is highly expressive, this seems like a configurability overkill -- in the literature there's not much evidence that there's a lot to be gained from using heterogenous dropout rates across a model. On the other hand, unifying all modules to use a single dropout parameter seems like it strikes a better balance.
From a simplicity standpoint, it would be nice to have one global dropout param... that said, we shouldn't constrain the user if they have a particular use case (i.e. reproducing some paper result). Maybe we could structure it similarly to how we have preprocessing, where one can choose preprocessing params for each individual feature, or choose global preprocessing params used by all features (of a given type).
In the dropout case, we allow people to set the dropout param for each ECD component, but also expose some global dropout param that auto-sets the dropout param for all components.