Yu Chin Fabian Lim
Yu Chin Fabian Lim
its been a pleasure @muellerzr! i have just pulled the latest changes from `main`!
Hi @amyeroberts looking forward to your review! If there is anything I can address please feel free to let me know. FYI: @muellerzr
@amyeroberts I pulled main again have [updated the code](https://github.com/huggingface/transformers/pull/29589/commits/968d4154750c9041bacd077f0855aaf476057d6d) to conform to @muellerzr's changes in https://github.com/huggingface/transformers/pull/29779
@ehartford but merging has to be done in higher precision. Doesnt that defeat the purpose of wanting to have the base weights in low precision to speed up inference?
>After running pip install openrlhf[vllm,ring,liger], it solves my problem.(I was using my own conda environment beforehand) @lyf07 this issue only comes up when installing vllm in editable mode. If you...
@ArthurZucker @matthewdouglas I tried this fix but im having similar NCCL issues as what you had. Unfortunately your suggestion to upgrade to latest is not working. I understand you have...
> The loss is scaled so that they are parallelism ignorant. Otherwise, you will find that these losses magnitude is different if you use different sp_size, which should not be...
@vermouth1992 its kind of roughly the same ``` > metrics['actor/pg_loss'] [-0.3054201006889343, 0.0, -0.2829209864139557, -0.16481846570968628, -0.15159031748771667] > self.ulysses_sequence_parallel_size 1 > metrics['actor/pg_loss'] [1.2069861888885498, -0.24875977635383606, -0.2340698093175888, -0.1880173534154892, 0.0, -0.07240534573793411, 0.0, 0.0, -0.1661381721496582] >...
@fingertap @toslali-ibm @youkaichao i was able to reproduce the bug in a **pure VLLM setting** an A100 environment on python 3.12. Therefore, I do not believe this issue is due...
@tlrmchlsmth @yury-tokpanov We also recently opened a PR to add a new model (Bamba) that also requires mamba v2 support. For continuous batching it works, but supporting chunked prefill can...