pai4451

Results 30 comments of pai4451

> @pohunghuang-nctu can you confirm your cuda version? > I was using 11.6 and getting the same issue. > Using 11.3 resolved it for me. Please give it a try....

> I only have a single node with 8 GPUS 80GB each. > Are you using pipeline parallel across nodes? Does DS-inference support that? @mayank31398 Thanks. I just launched DeepSpeed...

@mayank31398 I don’t think there is much advantage on using multi-node for inference. We need multi-node for inference just because we only have several 8x A6000 48GB servers.

> @pohunghuang-nctu can you confirm your cuda version? I was using 11.6 and getting the same issue. Using 11.3 resolved it for me. Please give it a try. Thanks @mayank31398...

@mayank31398 From my impression, it is the number of input tokens that matters the `illegal memory access error` instead of the number of generated tokens. I can also generate two...

@RezaYazdaniAminabadi I can share my findings. I use two 8x A6000 (48G) nodes for inference, and when the input tokens more than 600 it will always lead to the CUDA...

> @pai4451 [#328 (comment)](https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/328#discussion_r954402510) > you can use these instructions for quantization. > However, this is a barebones script. > I would encourage to wait for this PR: #328 >...

Hi @shadowwider, where is the attached js? I also want to use GitHub copilot extension with fauxpilot server.

> Was this issue resolved? I'm also using fastapi and I face this issue after sometime. Initially I receive the responses and after some idle time I receive the same...

> I am having this issue with Langchain, FastAPI and StreamingResponse in Docker. I am using LCEL, including standard Runnables and custom Runnables. The issue occurs both when generating via...