Mayank Mishra comments

Results 187 comments of


                                            Mayank Mishra

Errors in generation (Bloom) when changing options sampling/use_cache

I haven't tried adjusting input tokens @thies1006 But I can confirm, I ran with input text = "Hello" and generated tokens from 10, 50, 100, 300, 500, 1000, 2000, 5000....

Errors in generation (Bloom) when changing options sampling/use_cache

I see @pai4451. Ill give it a shot.

Errors in generation (Bloom) when changing options sampling/use_cache

@RezaYazdaniAminabadi any followup on this? I am facing similar CUDA issues with longer input sequence lengths.

Errors in generation (Bloom) when changing options sampling/use_cache

@RezaYazdaniAminabadi I am also not sure but BLOOM is trained using ALiBi, ideally there should be no limit. I understand that this might not be possible. But GPT-3 allowed input...

DeepSpeed inference support for int8 parameters on BLOOM?

@pai4451 https://www.deepspeed.ai/docs/config-json/#weight-quantization You can't use it that way. Please refer to this config. Let me know if it works ;)

DeepSpeed inference support for int8 parameters on BLOOM?

As an alternative, you can use it in HuggingFace too. I haven't tried it either though.

DeepSpeed inference support for int8 parameters on BLOOM?

@pai4451 https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/328#discussion_r954402510 you can use these instructions for quantization. However, this is a barebones script. I would encourage to wait for this PR: https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/328 Planning to add server + CLI...