Connor Holmes
Connor Holmes
Thanks @tomerip for looking back into this. I think this does appear to be the same underlying issue as https://github.com/microsoft/DeepSpeed/issues/2357. A fix for this will likely come from https://github.com/microsoft/DeepSpeed/pull/2433, but...
Changes fixed under later memory refactor.
Adding quantization support is a high priority item on our roadmap! We are working to add support for this soon and as the timeline becomes more concrete will share more...
Thank you both for looking into this. I've made a PR (https://github.com/microsoft/DeepSpeed/pull/3046) to clean up this scheduling code such that it should work for our full range of supported sequence...
Hi @abacaj, I have created a PR (https://github.com/microsoft/DeepSpeed/pull/3256) where I am now seeing results align between DeepSpeed and the HuggingFace baseline. If you could validate in your environment as well...
Hi @publicstaticvo, thank you for reporting this issue. Currently, the Hybrid Engine is only supported for the OPT family of models, but additional model support (including GPT-J) is on our...
Hi @tmatup, within DeepSpeed, we control which devices are visible by setting the `CUDA_VISIBLE_DEVICES` environment variable, as you can see in the final line in your log. The practical impact...
Hi all, sorry for the slow response time on this! I have created a PR (https://github.com/microsoft/DeepSpeed/pull/3256) where I am now seeing model outputs match the HuggingFace baseline. If anyone has...
> @cmikeh2 the nv-mii test and amd test failed again. Do you think it's related to my modification? Or just need to retry? I think it’s likely unrelated. We sometimes...
Hi @zelcookie, thanks for reporting this. I am able to reproduce with your scripts and will work on determining a root cause of this.