Mayank Mishra

Results 187 comments of Mayank Mishra

@xk503775229 https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/339 This PR adds support for BLOOM ds-inference with fp16 and int8. The README is not up-to-date yet. I will work on fixing that.

Can you share a code snippet you used for loading GPT? Also, currently, DS-inference uses fp16 special CUDA kernels for inference which is not the case for int8. int8 CUDA...

In general, the code is only supposed to work with Megatron checkpoints. But there is an exception for BLOOM. Not sure about the reason. @jeffra can you comment? I am...

I don't think you will be able to do this on 24GB GPU. I am guessing you are using a RTX 3090? You can give it a try.

Did you use megatron? Or does deepspeed has support for tensor parallel?

@sarthaklangde i have the same issue. I believe this might be due to the internal pcie tree implementation being different. @stas00 fyi

I dont believe it has anything to do with your environment

Hi, I am not sure but in the original megatron code, there was an argument (don't remember the name) that resets the optimizer, dataloader etc which.you could use to do...

@gordicaleksa , I think everyone is using Linux here. Also, this is not the correct place to ask this. Please create an issue [here](https://github.com/NVIDIA/apex)

This is expected behaviour @henan991201 When you reshard the model, the order of operations change. And floating point operations are not associative Refer to https://github.com/pytorch/pytorch/issues/76232