Matthias Gerstgrasser
Matthias Gerstgrasser
I got a possibly similar error just now on a distributed `tune.run()` / RLlib run. Is this the same issue? Any workaround? @matthewdeng ``` Traceback (most recent call last): File...
Ah, you mean `remove_padding_in_sequences()`? Wouldn't that still work with only right-padding?
Ah, no, to be clear, what I mean is the following: Right now, the padding is done like this ('promp' - a prompt token, 'respo' - a response token): ```...
That's not what I am proposing though! What I mean is, if I return it without the pads in the middle from `_generate_vllm()`, would that break anything? (No worries if...
Ahhhh, got it, that makes sense. I think that's probably broken with local generation then! I just verified that that doesn't have EOS if max_tokens is reached. Also, would taking...
> I am not sure which approach to take at the moment, but our current implementation is heavily dependent on EOS tokens. You mean specifically for the RM? Or more...
Oh, for local generation, does `actor.process_sequences()` do the same thing? https://github.com/OpenLLMAI/OpenRLHF/blob/bed10e115523a9eca419cb058ede8e531d23c182/openrlhf/models/actor.py#L159 If so, then doing this in `RemoteExperienceMaker` seems unnecessary, since that also calls `actor.process_sequences()` later anyway, i.e. this is...
@hijkzzz Could I ask a quick related question: In `actor.process_sequences()` I also see that `attention_mask` is set to False on all EOS tokens, except the final EOS token in each...
> I would like to know what kind of `ExperienceMaker ` you need at first I had multiple different ones in mind, actually, for different projects. For instance: * reward...
OK! I'll test this on my side for a bit to make sure it covers all the use cases I have in mind. I'll open a PR in a couple...