Sindhu Somasundaram
Sindhu Somasundaram
/kind feature [Hello, We spoke in today's community meeting. Just creating this issue to track our process of integration.] Referencing the steps involved, - Create DJLServing serving runtime CR and...
/kind bug **What steps did you take and what happened:** This is in reference with this issue. [https://github.com/kserve/kserve/issues/2370](https://github.com/kserve/kserve/issues/2370) We introduced a new yaml runtime in this folder [https://github.com/kserve/kserve/tree/master/config/runtimes](https://github.com/kserve/kserve/tree/master/config/runtimes ). When...
**Describe the bug** We have tested PR again a few models https://github.com/microsoft/DeepSpeed/pull/2662 - OPT 1.3B, 2 tp degree, fp16 - OPT 13B, 4 tp degree, [fp16, int8] - OPT 30B,...
**Describe the bug** We conducted tests on OPT/GPTJ/GPT-Neox/BLOOM 7B INT8, these models are all producing garbage outputs on DeepSpeed 0.8.1 OPT model is NCCL communication issue GPT-NeoX 20B is producing...
Trying out T5 with python backend. https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/enc_dec/run.py#L484 I see SamplingConfig has output_log_probs https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L355. But in the return dict does not have the log probabilities https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L2515. Is there any other way...
/kind bug https://kserve.github.io/website/0.9/developer/developer/#run-e2e-tests-locally Trying to run end to end testing locally in my Ubuntu environment. When I do `make undeploy` it hangs for a long time. Is it expected? I...
## How to reproduce ```python from transformers import AutoModelForCausalLM, AutoTokenizer if __name__ == '__main__': model_name = 'facebook/opt-30b' model = AutoModelForCausalLM.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name) from parallelformers import parallelize parallelize(model, num_gpus=8, fp16=True)...
## Description ## - Extracting Rolling batch handling away from huggingface.py, but the UX wont change, the handler would still be `djl_python.huggingface`
## Description The partitioned checkpoints are uploaded with the prefix object of the bucket provided. There should be an option to override this prefix object or remove the prefix object....
## Description Right now, in our default handlers, tokenizer is expected. Right behaviour: 1. If HF model_id is provided, then we get the tokenizer from there and just download it...