Stas Bekman
Stas Bekman
> So, just to summarize for cli.py: You want the option for user to provide an input text file? If we keep the original scripts as they there are I...
> I have added back the older scripts in scripts/inference folder (original path). and let's move the 2 solutions to their respective sub-dirs: ``` bloom-inference-scripts bloom-inference-server ``` we don't want...
> I assume these 2 folders should be located inside `scripts/` yes, please.
Looks great, @mayank31398 - thank you! I need to think how I could test the functionality. Probably might work from the local client on the same host (as the node...
so is there a memory leak still or not? You shared there still was but then removed the comment. If there is we should try to isolate it - by...
It looks like the `transformers` internal APIs have changed and the old hack I used for `get_checkpoint_files` no longer works. Here is the new drop in replacement and the good...
when I run: ``` python scripts/bloom-inference-server/cli.py --model_name bigscience/bloom --dtype bf16 --deployment_framework hf_accelerate --generate_kwargs '{"min_length": 100, "max_new_tokens": 100, "do_sample": false}' ``` the first input went ok, but when it returned it...
The DS-inference server crashes for me w/o the cached tp checkpoint ``` deepspeed --num_gpus 8 scripts/bloom-inference-server/benchmark.py --model_name bigscience/bloom --dtype fp16 --deployment_framework ds_inference --benchmark_cycles 5 [...] Traceback (most recent call last):...
> I feel like empty prompt is fine. This is basically unconditional generation right? Good question - I have never tried it - I guess it should work.
This could be done with monkey patching first and then later added upstream. I'm just not sure we should start working on it until this Issue is fixed https://github.com/microsoft/DeepSpeed/issues/1522. As...