Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

Add generation server scripts using HF accelerate and DS-inference

> So, just to summarize for cli.py: You want the option for user to provide an input text file? If we keep the original scripts as they there are I...

Add generation server scripts using HF accelerate and DS-inference

> I have added back the older scripts in scripts/inference folder (original path). and let's move the 2 solutions to their respective sub-dirs: ``` bloom-inference-scripts bloom-inference-server ``` we don't want...

Add generation server scripts using HF accelerate and DS-inference

> I assume these 2 folders should be located inside `scripts/` yes, please.

Add generation server scripts using HF accelerate and DS-inference

Looks great, @mayank31398 - thank you! I need to think how I could test the functionality. Probably might work from the local client on the same host (as the node...

Add generation server scripts using HF accelerate and DS-inference

so is there a memory leak still or not? You shared there still was but then removed the comment. If there is we should try to isolate it - by...

Add generation server scripts using HF accelerate and DS-inference

It looks like the `transformers` internal APIs have changed and the old hack I used for `get_checkpoint_files` no longer works. Here is the new drop in replacement and the good...

Add generation server scripts using HF accelerate and DS-inference

when I run: ``` python scripts/bloom-inference-server/cli.py --model_name bigscience/bloom --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}' ``` the first input went ok, but when it returned it...

Add generation server scripts using HF accelerate and DS-inference

The DS-inference server crashes for me w/o the cached tp checkpoint ``` deepspeed --num_gpus 8 scripts/bloom-inference-server/benchmark.py --model_name bigscience/bloom --dtype fp16 --deployment_framework ds_inference --benchmark_cycles 5 [...] Traceback (most recent call last):...

Add generation server scripts using HF accelerate and DS-inference

> I feel like empty prompt is fine. This is basically unconditional generation right? Good question - I have never tried it - I guess it should work.

[deepspeed pipe] expand the partitioning method to support weights

This could be done with monkey patching first and then later added upstream. I'm just not sure we should start working on it until this Issue is fixed https://github.com/microsoft/DeepSpeed/issues/1522. As...