Mayank Mishra

Results 187 comments of Mayank Mishra

I see @stas00 I have a script that does that for you. Are you saying the node you have can only be accessed from a login node or something? If...

Also, switching from Flask to Starlette + FastAPI fixed the memory leak.

So, i don't really think its a memory leak @stas00 https://github.com/huggingface/accelerate/issues/614 I want to look more into this. THis is what I plan to do: https://github.com/huggingface/accelerate/issues/614#issuecomment-1224285433 Should give us an...

For now, I will investigate this further. Lets not merge this PR yet :)

> The DS-inference server crashes for me w/o the cached tp checkpoint > > ``` > deepspeed --num_gpus 8 scripts/bloom-inference-server/benchmark.py --model_name bigscience/bloom --dtype fp16 --deployment_framework ds_inference --benchmark_cycles 5 > [...]...

I feel like empty prompt is fine. This is basically unconditional generation right? I will try to fix the keyboard interrupt.

Also, @stas00 I have been meaning to ask. Is accelerate using DeepSpeed ZeRO as its backend? Because if that is the case then the generation time per batch for both...

@xuyifanbupt if you are trying to deploy BLOOM 176B as a server deployment, you can find [here](https://github.com/bigscience-workshop/Megatron-DeepSpeed/pull/325)

@asafkar , the ds inference script is compatible with HF checkpoints.

Wait @asafkar does DS-inference support pipeline parallelism? I thought it was only tensor parallel for generation