Bang DaeYeong
Bang DaeYeong
Hello, first of all, thank you for creating this library. I have 2 questions. First question, I saw [this guide ](https://github.com/NVIDIA/FasterTransformer/blob/dev/v5.0_beta/docs/gptj_guide.md)and successfully started Triton Server. and Here is my request...
### Describe the bug i am save bento with mlflow (sentence transformers) ```python def save_model_to_mlflow(self, version): signature = mlflow.models.infer_signature( self.input_data, self.output_data ) model_info: mlflow.models.model.ModelInfo = ( mlflow.sentence_transformers.log_model( model=self.model, artifact_path=self.model_name, signature=signature,...
### Describe the bug gensim word2vec model ```python import os import mlflow from gensim.models.word2vec import Word2Vec class ScappyWrapper(mlflow.pyfunc.PythonModel): def load_context(self, context): file_path = os.path.join(context.artifacts["model_path"], "scappy_base.bin") self.model = Word2Vec.load(file_path) def predict(self,...
### Describe the bug The timeout setting of api_server and runner is not working in bentoml. i'm using bentoml 1.0.20.post11 version The default configuration is as follows ```yaml version: 1...
1. offline serving data:image/s3,"s3://crabby-images/d5875/d5875fa96f903c9811dd22e31f74efa3f0bcc50a" alt="image" 2. online serving(fastapi) data:image/s3,"s3://crabby-images/4711b/4711bb04f4c948575a477c2cb9b76d9cbf5c86cf" alt="image" data:image/s3,"s3://crabby-images/82c5d/82c5db0d0f1af41f3fe256afdcd6db7c9a4d02c2" alt="image" log: INFO 12-11 21:50:36 llm_engine.py:649] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 1 reqs, Swapped: 0 reqs,...