aghyad-deeb
aghyad-deeb
I want to point out that, both empirically and from looking at the code, it seems the HF implementation of Flex Attention supports sinks for both forward and backward passes,...
I also want to point out that the latest released version of vLLM doesn't support LoRA for GPT OSS; it applies lora only to attention layers which causes discrepancy between...
I was getting a similar error when doing grpo training and enabling async reward computation in the config. The following edits fixed it: Add this ```python reward_fn = load_reward_manager( config,...
@MichaelRipa Thanks for your reply! In case this is helpful, here's why I think it's particularly useful to have support for `StoppingCriteria`. When having the `remote=True` option, there seems to...