Conglong Li
Conglong Li
Hi @hubertlu-tw @jithunnair-amd, @awan-10 and I worked on this 1-bit compression and the 1-bit Adam, 0/1Adam, 1-bit LAMB optimizers. First, compressed AllReduce is necessary because you can't do compression communication...
@bm-synth This is a completely new technique. For this kind of contribution, based on the guideline (https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md#new-feature-contribution-guidelines) we would need to first judge the value based on some formal evaluation...
> @conglongli @mrwyattii I added some information to this PR in line with the [new contributions page](https://github.com/microsoft/DeepSpeed/blob/master/CONTRIBUTING.md#new-feature-contribution-guidelines) you sent. The logic for this PR is done and the example works...
Hi @bigeagle , we want to confirm one thing before reviewing your PR: is your example using DeepSpeed-Chat framework, or is it more like a standalone example only using some...
> Hi @conglongli, this PR does not use deepspeed-chat framework. I think it's a bit confusing that this repo is named as `deepspeed-examples`, instead of `deepspeed-chat`, especially to those who...
Close as the issue seems solved. Feel free to reopen for any further questions.
Closing because (1) this is a dev branch belonging to another person (2) that person already opened a PR https://github.com/microsoft/DeepSpeedExamples/pull/291 (and already merged) before this PR. @jawomg please refrain from...
(A bit off-topic but curriculum learning is actually available in all three repos :) https://github.com/microsoft/DeepSpeedExamples/tree/master/Megatron-LM-v1.1.5-ZeRO3/curriculum_learning, https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/curriculum_learning, and https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/main/examples/curriculum_learning. The later two are basically the same, and the difference between the...
Regarding the relationship between the three repos, on Microsoft DeepSpeed side we do plan to make the https://github.com/microsoft/Megatron-DeepSpeed the only showcase for DeepSpeed for Megatron examples, because the hard copies...
Right, let me raise this todo in our team.