Mayank Mishra comments

Results 187 comments of


                                            Mayank Mishra

questuion : how to inference Int8 models (GPT) supported through ZeroQuant technology ?

@xk503775229 https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/339 This PR adds support for BLOOM ds-inference with fp16 and int8. The README is not up-to-date yet. I will work on fixing that.

questuion : how to inference Int8 models (GPT) supported through ZeroQuant technology ?

Can you share a code snippet you used for loading GPT? Also, currently, DS-inference uses fp16 special CUDA kernels for inference which is not the case for int8. int8 CUDA...

questuion : how to inference Int8 models (GPT) supported through ZeroQuant technology ?

In general, the code is only supposed to work with Megatron checkpoints. But there is an exception for BLOOM. Not sure about the reason. @jeffra can you comment? I am...

Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels

I don't think you will be able to do this on 24GB GPU. I am guessing you are using a RTX 3090? You can give it a try.

Can we also train BLOOM model using tensor using tensor-Parallelism and efficient fused CUDA kernels

Did you use megatron? Or does deepspeed has support for tensor parallel?

Slower inference results for BLOOM fp16 on identical hardware

@sarthaklangde i have the same issue. I believe this might be due to the internal pcie tree implementation being different. @stas00 fyi

Slower inference results for BLOOM fp16 on identical hardware

I dont believe it has anything to do with your environment

Finetuning BLOOM

Hi, I am not sure but in the original megatron code, there was an argument (don't remember the name) that resets the optimizer, dataloader etc which.you could use to do...

Installing Apex on Windows

@gordicaleksa , I think everyone is using Linux here. Also, this is not the correct place to ask this. Please create an issue [here](https://github.com/NVIDIA/apex)

About convert deepspeed to deepspeed checkpoint

This is expected behaviour @henan991201 When you reshard the model, the order of operations change. And floating point operations are not associative Refer to https://github.com/pytorch/pytorch/issues/76232