Bang DaeYeong issues

Results 5 issues of


                                            Bang DaeYeong

How to apply topk option each inputs?

Hello, first of all, thank you for creating this library. I have 2 questions. First question, I saw [this guide ](https://github.com/NVIDIA/FasterTransformer/blob/dev/v5.0_beta/docs/gptj_guide.md)and successfully started Triton Server. and Here is my request...

bug: bentoml not support gpu in mlflow?

### Describe the bug i am save bento with mlflow (sentence transformers) ```python def save_model_to_mlflow(self, version): signature = mlflow.models.infer_signature( self.input_data, self.output_data ) model_info: mlflow.models.model.ModelInfo = ( mlflow.sentence_transformers.log_model( model=self.model, artifact_path=self.model_name, signature=signature,...

bug

bug: adaptive batching not work in mlflow model (gensim word2vec)

### Describe the bug gensim word2vec model ```python import os import mlflow from gensim.models.word2vec import Word2Vec class ScappyWrapper(mlflow.pyfunc.PythonModel): def load_context(self, context): file_path = os.path.join(context.artifacts["model_path"], "scappy_base.bin") self.model = Word2Vec.load(file_path) def predict(self,...

bug

bug: timeout configuration not work

### Describe the bug The timeout setting of api_server and runner is not working in bentoml. i'm using bentoml 1.0.20.post11 version The default configuration is as follows ```yaml version: 1...

bug

feedback-wanted

why online seving slower than offline serving??

1. offline serving ![image](https://github.com/vllm-project/vllm/assets/43260218/87e216b5-9064-4c2a-a021-cac08e22795d) 2. online serving(fastapi) ![image](https://github.com/vllm-project/vllm/assets/43260218/322cc4a4-a78f-4212-a266-d586e8e2969d) ![image](https://github.com/vllm-project/vllm/assets/43260218/49c9cf76-ca3f-4362-95d8-191cbbdd3543) log: INFO 12-11 21:50:36 llm_engine.py:649] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs,...