Sindhu Somasundaram issues

Results 16 issues of


                                            Sindhu Somasundaram

Introducing DJLServing predictor in KServe

/kind feature [Hello, We spoke in today's community meeting. Just creating this issue to track our process of integration.] Referencing the steps involved, - Create DJLServing serving runtime CR and...

kind/feature

kserve/servingruntime

How to include a new predictor in Kserve?

/kind bug **What steps did you take and what happened:** This is in reference with this issue. [https://github.com/kserve/kserve/issues/2370](https://github.com/kserve/kserve/issues/2370) We introduced a new yaml runtime in this folder [https://github.com/kserve/kserve/tree/master/config/runtimes](https://github.com/kserve/kserve/tree/master/config/runtimes ). When...

kind/question

[BUG] Tested the 2662 PR, It fails for GPTJ 6B and few others

**Describe the bug** We have tested PR again a few models https://github.com/microsoft/DeepSpeed/pull/2662 - OPT 1.3B, 2 tp degree, fp16 - OPT 13B, 4 tp degree, [fp16, int8] - OPT 30B,...

bug

inference

[BUG] [0.8.1] INT8 model loading/inference issue

**Describe the bug** We conducted tests on OPT/GPTJ/GPT-Neox/BLOOM 7B INT8, these models are all producing garbage outputs on DeepSpeed 0.8.1 OPT model is NCCL communication issue GPT-NeoX 20B is producing...

bug

inference

Question: Return log probabilites

Trying out T5 with python backend. https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/run.py#L484 I see SamplingConfig has output_log_probs https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L355. But in the return dict does not have the log probabilities https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2515. Is there any other way...

`make undeploy` hangs for a long time.

/kind bug https://kserve.github.io/website/0.9/developer/developer/#run-e2e-tests-locally Trying to run end to end testing locally in my Ubuntu environment. When I do `make undeploy` it hangs for a long time. Is it expected? I...

kind/bug

Bus error in parallelformers 1.2.7 for OPT model

## How to reproduce ```python from transformers import AutoModelForCausalLM, AutoTokenizer if __name__ == '__main__': model_name = 'facebook/opt-30b' model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) from parallelformers import parallelize parallelize(model, num_gpus=8, fp16=True)...

bug

Sindhu Somasundaram