pai4451 comments

Results 30 comments of


                                            pai4451

Errors in generation (Bloom) when changing options sampling/use_cache

> @pohunghuang-nctu can you confirm your cuda version? > I was using 11.6 and getting the same issue. > Using 11.3 resolved it for me. Please give it a try....

Errors in generation (Bloom) when changing options sampling/use_cache

> I only have a single node with 8 GPUS 80GB each. > Are you using pipeline parallel across nodes? Does DS-inference support that? @mayank31398 Thanks. I just launched DeepSpeed...

Errors in generation (Bloom) when changing options sampling/use_cache

@mayank31398 I don’t think there is much advantage on using multi-node for inference. We need multi-node for inference just because we only have several 8x A6000 48GB servers.

Errors in generation (Bloom) when changing options sampling/use_cache

> @pohunghuang-nctu can you confirm your cuda version? I was using 11.6 and getting the same issue. Using 11.3 resolved it for me. Please give it a try. Thanks @mayank31398...

Errors in generation (Bloom) when changing options sampling/use_cache

@mayank31398 From my impression, it is the number of input tokens that matters the `illegal memory access error` instead of the number of generated tokens. I can also generate two...

Errors in generation (Bloom) when changing options sampling/use_cache

@RezaYazdaniAminabadi I can share my findings. I use two 8x A6000 (48G) nodes for inference, and when the input tokens more than 600 it will always lead to the CUDA...

DeepSpeed inference support for int8 parameters on BLOOM?

> @pai4451 [#328 (comment)](https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/328#discussion_r954402510) > you can use these instructions for quantization. > However, this is a barebones script. > I would encourage to wait for this PR: #328 >...

pai4451

Errors in generation (Bloom) when changing options sampling/use_cache

Errors in generation (Bloom) when changing options sampling/use_cache

Errors in generation (Bloom) when changing options sampling/use_cache

Errors in generation (Bloom) when changing options sampling/use_cache

Errors in generation (Bloom) when changing options sampling/use_cache

Errors in generation (Bloom) when changing options sampling/use_cache

DeepSpeed inference support for int8 parameters on BLOOM?

VSCode plugin

APIConnectionError: Error communicating with OpenAI.

APIConnectionError: Error communicating with OpenAI.