BentoML icon indicating copy to clipboard operation
BentoML copied to clipboard

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

Results 258 BentoML issues
Sort by recently updated
recently updated
newest added

### Describe the bug The command `bentoml build --containerize` copies a serious amount of data to the docker build. I'm not entirely sure where it comes from, but it's 29GB...

bug

### Describe the bug I deployed TheBloke/Mixtral-8x7B-v0.1-GPTQ using VLLM backend. I get an error when I call the openllm query When I try the api call to http://localhost:3000/v1/generate it works...

bug

### Describe the bug I would like to use the `X-Amzn-SageMaker-Custom-Attributes` header to get a value needed for my sagemaker endpoint. Unfortunately, according to the section #7 of the readme...

bug

### Feature request When starting service bento tries to access hf hub, which is blocked in deploy cluster. Found an inner parameter sync_with_hub_version to control this feature, would it be...

enhancement

### Describe the bug Hello BentoML! I am trying to use huggingface transformers as an inference runner after converting it to ONNX. I converted the model to ONNX via Optimum....

bug

### Feature request The default log configuration file in `./src/bentoml/_internal/monitoring/default.py` uses a TimedRotatingFileHandler which specifies a daily rotation (i.e., when: "D"). ``` DEFAULT_CONFIG_YAML = """ version: 1 disable_existing_loggers: false loggers:...

enhancement

### Describe the bug ``` class TransformersRunnable(bentoml.Runnable): SUPPORTED_RESOURCES = ("nvidia.com/gpu", "cpu") SUPPORTS_CPU_MULTI_THREADING = True def __init__(self): super().__init__() available_gpus = os.getenv("CUDA_VISIBLE_DEVICES", "") # assign CPU resources kwargs = {} if available_gpus...

bug

As per https://docs.bentoml.org/en/latest/guides/monitoring.html, the monitoring log contains the following information: ``` $ cat monitoring/iris_classifier_prediction/data/data.1.log {"sepal length": 5.9, "sepal width": 3.0, "petal length": 5.1, "petal width": 1.8, "pred": "virginica", "timestamp": "2023-12-26T11:10:16.112687",...

### Feature request I am deploying an object detector and have a scenario where I need to process a video. However, this feature looks to be not yet handled. Can...

enhancement