Reza Yazdani comments

Results 95 comments of


                                            Reza Yazdani

[BUG] Inference predictions dont match Huggingface for GPT-J

the results might be different, but is it meaningful when using kernels? Can you please paste the output?

[BUG] Inference predictions dont match Huggingface for GPT-J

Interesting, I am not sure what might be different between our system environment that we see different results! I am using Torch1.12+CUDA11.6 and I see similar results between HF and...

[BUG] Inference predictions dont match Huggingface for GPT-J

Can you please paste the whole log? I want to see the transformer configuration

[BUG] Inference predictions dont match Huggingface for GPT-J

Sorry, I meant the output log when you are running the test.

[BUG] Inference predictions dont match Huggingface for GPT-J

Hi @AlexWortega, I did retry running this with the same test above. I also did modify it to be similar to yours. However, I am still seeing similar results between...

[BUG] Inference predictions dont match Huggingface for GPT-J

Hi @AlexWortega , Here is the output of the two commands: ``` ds_report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at...

memory reallocation for bigger batch size

Hi @reymondzzzz Thanks for the PR. I see this can fix some assumptions we have on model size or batch size during the runtime. But, would you mind give a...

[BUG] High VRAM Usage For Inference, Torch Dtype Doesn't Matter

Hi @mallorbc , The problem is that the model selected from HF is Fp32 and it will load the checkpoint before coming to the model-partitioning on the DeepSeed-Inference side. For...

[BUG] High VRAM Usage For Inference, Torch Dtype Doesn't Matter

> By changing the pipeline to the following I now get VRAM usage of roughly 12GB per GPU. However, shouldn't the model be split over both GPUs and thus roughly...

[BUG] DeepSpeed non-deterministic inference with HF GPT2 when `replace_with_kernel_inject=True`

Hi @trianxy , I think I know where this issue is coming from. It is due to reducing the max-tokens to 128 [here](https://github.com/microsoft/DeepSpeed/blob/master/csrc/transformer/inference/includes/custom_cuda_layers.h#L20). We have a [PR](https://github.com/microsoft/DeepSpeed/pull/2212) to fix this...