Blake comments

Results 121 comments of


                                            Blake

[BUG] Illegal memory access CUDA error when using long sequences

I also have encountered this error. Trying small inputs such as what the tutorial uses "DeepSpeed is" leads to normal results, but using significantly longer input leads to an illegal...

[BUG] Illegal memory access CUDA error when using long sequences

> FYI @mallorbc , @tomeras91 , @RezaYazdaniAminabadi : > > My related issue which I detailed above is fixed in [this PR](https://github.com/microsoft/DeepSpeed/pull/2212). More precisely, my issue does not appear when...

How to rate limit based on [user id/Token]

``` def get_limit_for_user(): request = _request_ctx_var.get() key = request.cookies.get("key") ``` You have to pass values through cookies. In the body it will not work due to asyc.

ZeRO 2 cpu_offload causes RuntimeError: expected input to be on cuda

Pipeline parallelism and Zero 2/3 are not compatible then? How would one train a large model say 20B parameters without pipeline parallelism while using CPU offload, even if just a...

🐛 [Bug] Shape analysis failure when encountered dynamic fallback

Pretty sure I also have this issue. Trying to use dynamic input sizes for GPT models. I will look at PR later to see if that helps.

[BUG] Floating Point Exception (core dump) at launch_attn_softmax_v2<float>

I experienced this issue as well for tag 0.7.7 when trying to use DeepSpeed inference for GPTJ. This issue occurred for both float16 and float32. I am about to test...

[BUG] Floating Point Exception (core dump) at launch_attn_softmax_v2<float>

model = AutoModelForCausalLM.from_pretrained(self.config.model,torch_dtype=torch.float16) tokenizer = AutoTokenizer.from_pretrained(self.config.model) local_rank = 0 world_size = 1 generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=local_rank,torch_dtype=torch.float16) generator.model = deepspeed.init_inference(generator.model, mp_size=world_size, dtype=torch.half, replace_method='auto', max_tokens=self.config.max_tokens, replace_with_kernel_inject=True)