Blake

Results 121 comments of Blake

I also have encountered this error. Trying small inputs such as what the tutorial uses "DeepSpeed is" leads to normal results, but using significantly longer input leads to an illegal...

> FYI @mallorbc , @tomeras91 , @RezaYazdaniAminabadi : > > My related issue which I detailed above is fixed in [this PR](https://github.com/microsoft/DeepSpeed/pull/2212). More precisely, my issue does not appear when...

``` def get_limit_for_user(): request = _request_ctx_var.get() key = request.cookies.get("key") ``` You have to pass values through cookies. In the body it will not work due to asyc.

Pipeline parallelism and Zero 2/3 are not compatible then? How would one train a large model say 20B parameters without pipeline parallelism while using CPU offload, even if just a...

Pretty sure I also have this issue. Trying to use dynamic input sizes for GPT models. I will look at PR later to see if that helps.

I experienced this issue as well for tag 0.7.7 when trying to use DeepSpeed inference for GPTJ. This issue occurred for both float16 and float32. I am about to test...

model = AutoModelForCausalLM.from_pretrained(self.config.model,torch_dtype=torch.float16) tokenizer = AutoTokenizer.from_pretrained(self.config.model) local_rank = 0 world_size = 1 generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=local_rank,torch_dtype=torch.float16) generator.model = deepspeed.init_inference(generator.model, mp_size=world_size, dtype=torch.half, replace_method='auto', max_tokens=self.config.max_tokens, replace_with_kernel_inject=True)

Can confirm I have the same issue with the master branch

@lokoppakmsft it may be a good idea to reopen this issue.

Closing issue. Reopen if the issue still persists.