Mor Zusman
Mor Zusman
Opened a PR fixing this issue https://github.com/microsoft/DeepSpeed/pull/2828
@zhen-jia I also encountered this bug, fixed it by simply changing ATTN_THREADS to 512
> May I get an update regarding the status of this PR? It seems the author stopped working on it? We're currently still working on it, The PR works well,...
AFAIU CI distributed-tests-2-gpus test fails regardless of this PR.
> QQ: Does this PR support parallel sampling (i.e., `n` > 1 in sampling params)? While I don't think it is not necessary to support parallel sampling in this PR,...
Tests failed due to timeouts to HF Ready to be merged
@daphneOdera-618 Yeah, the default setup.py behaviour is to download the upstream's wheel upon "installing", What you would need to do to force build is to add `MAMBA_FORCE_BUILD=TRUE pip install .`
@simon-mo Are there any blockers for merge? Thanks
@toslunar In tests we've done with bf16 , we haven't found evidence for quality degradation, can you share some results or benchmarks?