s4 icon indicating copy to clipboard operation
s4 copied to clipboard

RL Tuning Tips

Open smorad opened this issue 2 years ago • 5 comments

I'm currently writing a recurrent reinforcement library, with LSTMs, linear attention, etc that I would like to add S4 to. Unfortunately, I find S4D unable to learn in even simple RL tasks (e.g. output the input from 4 timesteps ago).

Do you know of any configurations or tips for making S4/S4D models smaller/more robust? LR is already set very low (1e-5) and I'm using enormous batch sizes (65536 transitions per batch). Other recurrent models are able to learn, but not S4. This is what I have thus far:

class S4Model(nn.Module):
    def __init___(...):
        ...
        self.core = S4Model(
            d_model=self.h, # This is usually on the order of ~10-20
            d_state=self.h,
            mode="diag",
            measure="diag-lin",
            bidirectional=False,
            disc="zoh",
            real_type="exp",
            transposed=False,
        )
        self.core.setup_step()

...

def forward(self, z, state):
    batch, time, feature = z.shape
    if time == 1:
        # Rollout/inference (recurrent mode)
        z, state = self.core.step(z.reshape(batch, time))
    else:
        # Train (batch mode)
        z, _ = self.core(z)
    return z, state

smorad avatar Sep 07 '22 12:09 smorad

It's hard for me to say more without understanding the setting more completely. At a glance, the model seems to be set up correctly. Some things I notice are:

  • Did you intend to use only 1 layer? Generally we always use deeper models with residual connections. Or is your code just a snippet of 1 layer from a bigger model?
  • This model looks very tiny, at most a few thousand parameters. The model dimensions are very small compared to what we normally use (e.g. d_model=256 d_state=64). Are the baseline models of a similar size? Can LSTM and linear attention solve these tasks with ~1k parameters and 1 layer?
  • Is training done with convolution or recurrent mode? The way the code is currently written, setup_step is meant to be called after training and only for inference; I've never called it in the init call (I believe it shouldn't affect anything, but not 100% sure right now). Is training and evaluation both done with the same mode for consistency?
  • Why does the LR need to be set so slow? Is training unstable with higher LR? I've almost always found that S4 can get away with higher LR than LSTM and Attention (I can't think of any exceptions actually) so I'm not sure what would be causing instabilities.

albertfgu avatar Sep 07 '22 19:09 albertfgu

Based on your comments, I trained S4 models with more layers and a larger d_model and d_state. I used convolution mode to update the parameters and recurrent mode to generate and collect trajectories. So far, there has been no significant improvement, and it is still worse than MLP. Do you have any other suggestions?

mrsamsami avatar Nov 04 '22 16:11 mrsamsami

It's very hard for me to say without knowing more details about the problem (and I'm not an expert in RL). If MLP is doing well, perhaps the problem is Markovian and doesn't need a sequence model at all. Another sanity check could be to try other recurrent baselines such as an LSTM/GRU core, which should have a similar interface to S4(D). I know of another RL project using S4(D)/S5 where they found that it was consistently much better than an LSTM, so this type of baseline could reveal if the discrepancy is in the problem setup (e.g. if you find MLP is better than any sequence model) or in the S4 usage specifically (e.g. if you find LSTM is better than S4)

albertfgu avatar Nov 04 '22 18:11 albertfgu

Hi Albert, given the situation that people are trying S4D in the RL settings, would it be possible to provide or extend the current minimal s4d.py to also have step function to be able to run(train) in the RNN model, in case people are doing it wrong?

MichaelFYang avatar Oct 04 '23 14:10 MichaelFYang

The minimal file is purposefully minimal. The documentation explains the additional features available in the main module and how to use them. For example step code: https://github.com/HazyResearch/state-spaces/blob/main/models/s4/s4.py#L1197

albertfgu avatar Oct 04 '23 16:10 albertfgu