server
server copied to clipboard
Get shm memory status
Is your feature request related to a problem? Please describe.
When deploy many Python backend models, triton inference server run out of shm memory
Describe the solution you'd like
I want to get the status shm memory to check whether need to undeploy unused models
I checked the gRPC API, there is no API related to it.
Is there any way to know shm memory status
type GRPCInferenceServiceClient interface {
//@@ .. cpp:var:: rpc ServerLive(ServerLiveRequest) returns
//@@ (ServerLiveResponse)
//@@
//@@ Check liveness of the inference server.
//@@
ServerLive(ctx context.Context, in *ServerLiveRequest, opts ...grpc.CallOption) (*ServerLiveResponse, error)
//@@ .. cpp:var:: rpc ServerReady(ServerReadyRequest) returns
//@@ (ServerReadyResponse)
//@@
//@@ Check readiness of the inference server.
//@@
ServerReady(ctx context.Context, in *ServerReadyRequest, opts ...grpc.CallOption) (*ServerReadyResponse, error)
//@@ .. cpp:var:: rpc ModelReady(ModelReadyRequest) returns
//@@ (ModelReadyResponse)
//@@
//@@ Check readiness of a model in the inference server.
//@@
ModelReady(ctx context.Context, in *ModelReadyRequest, opts ...grpc.CallOption) (*ModelReadyResponse, error)
//@@ .. cpp:var:: rpc ServerMetadata(ServerMetadataRequest) returns
//@@ (ServerMetadataResponse)
//@@
//@@ Get server metadata.
//@@
ServerMetadata(ctx context.Context, in *ServerMetadataRequest, opts ...grpc.CallOption) (*ServerMetadataResponse, error)
//@@ .. cpp:var:: rpc ModelMetadata(ModelMetadataRequest) returns
//@@ (ModelMetadataResponse)
//@@
//@@ Get model metadata.
//@@
ModelMetadata(ctx context.Context, in *ModelMetadataRequest, opts ...grpc.CallOption) (*ModelMetadataResponse, error)
//@@ .. cpp:var:: rpc ModelInfer(ModelInferRequest) returns
//@@ (ModelInferResponse)
//@@
//@@ Perform inference using a specific model.
//@@
ModelInfer(ctx context.Context, in *ModelInferRequest, opts ...grpc.CallOption) (*ModelInferResponse, error)
//@@ .. cpp:var:: rpc ModelStreamInfer(stream ModelInferRequest) returns
//@@ (stream ModelStreamInferResponse)
//@@
//@@ Perform streaming inference.
//@@
ModelStreamInfer(ctx context.Context, opts ...grpc.CallOption) (GRPCInferenceService_ModelStreamInferClient, error)
//@@ .. cpp:var:: rpc ModelConfig(ModelConfigRequest) returns
//@@ (ModelConfigResponse)
//@@
//@@ Get model configuration.
//@@
ModelConfig(ctx context.Context, in *ModelConfigRequest, opts ...grpc.CallOption) (*ModelConfigResponse, error)
//@@ .. cpp:var:: rpc ModelStatistics(
//@@ ModelStatisticsRequest)
//@@ returns (ModelStatisticsResponse)
//@@
//@@ Get the cumulative inference statistics for a model.
//@@
ModelStatistics(ctx context.Context, in *ModelStatisticsRequest, opts ...grpc.CallOption) (*ModelStatisticsResponse, error)
//@@ .. cpp:var:: rpc RepositoryIndex(RepositoryIndexRequest) returns
//@@ (RepositoryIndexResponse)
//@@
//@@ Get the index of model repository contents.
//@@
RepositoryIndex(ctx context.Context, in *RepositoryIndexRequest, opts ...grpc.CallOption) (*RepositoryIndexResponse, error)
//@@ .. cpp:var:: rpc RepositoryModelLoad(RepositoryModelLoadRequest) returns
//@@ (RepositoryModelLoadResponse)
//@@
//@@ Load or reload a model from a repository.
//@@
RepositoryModelLoad(ctx context.Context, in *RepositoryModelLoadRequest, opts ...grpc.CallOption) (*RepositoryModelLoadResponse, error)
//@@ .. cpp:var:: rpc RepositoryModelUnload(RepositoryModelUnloadRequest)
//@@ returns (RepositoryModelUnloadResponse)
//@@
//@@ Unload a model.
//@@
RepositoryModelUnload(ctx context.Context, in *RepositoryModelUnloadRequest, opts ...grpc.CallOption) (*RepositoryModelUnloadResponse, error)
//@@ .. cpp:var:: rpc SystemSharedMemoryStatus(
//@@ SystemSharedMemoryStatusRequest)
//@@ returns (SystemSharedMemoryStatusRespose)
//@@
//@@ Get the status of all registered system-shared-memory regions.
//@@
SystemSharedMemoryStatus(ctx context.Context, in *SystemSharedMemoryStatusRequest, opts ...grpc.CallOption) (*SystemSharedMemoryStatusResponse, error)
//@@ .. cpp:var:: rpc SystemSharedMemoryRegister(
//@@ SystemSharedMemoryRegisterRequest)
//@@ returns (SystemSharedMemoryRegisterResponse)
//@@
//@@ Register a system-shared-memory region.
//@@
SystemSharedMemoryRegister(ctx context.Context, in *SystemSharedMemoryRegisterRequest, opts ...grpc.CallOption) (*SystemSharedMemoryRegisterResponse, error)
//@@ .. cpp:var:: rpc SystemSharedMemoryUnregister(
//@@ SystemSharedMemoryUnregisterRequest)
//@@ returns (SystemSharedMemoryUnregisterResponse)
//@@
//@@ Unregister a system-shared-memory region.
//@@
SystemSharedMemoryUnregister(ctx context.Context, in *SystemSharedMemoryUnregisterRequest, opts ...grpc.CallOption) (*SystemSharedMemoryUnregisterResponse, error)
//@@ .. cpp:var:: rpc CudaSharedMemoryStatus(
//@@ CudaSharedMemoryStatusRequest)
//@@ returns (CudaSharedMemoryStatusRespose)
//@@
//@@ Get the status of all registered CUDA-shared-memory regions.
//@@
CudaSharedMemoryStatus(ctx context.Context, in *CudaSharedMemoryStatusRequest, opts ...grpc.CallOption) (*CudaSharedMemoryStatusResponse, error)
//@@ .. cpp:var:: rpc CudaSharedMemoryRegister(
//@@ CudaSharedMemoryRegisterRequest)
//@@ returns (CudaSharedMemoryRegisterResponse)
//@@
//@@ Register a CUDA-shared-memory region.
//@@
CudaSharedMemoryRegister(ctx context.Context, in *CudaSharedMemoryRegisterRequest, opts ...grpc.CallOption) (*CudaSharedMemoryRegisterResponse, error)
//@@ .. cpp:var:: rpc CudaSharedMemoryUnregister(
//@@ CudaSharedMemoryUnregisterRequest)
//@@ returns (CudaSharedMemoryUnregisterResponse)
//@@
//@@ Unregister a CUDA-shared-memory region.
//@@
CudaSharedMemoryUnregister(ctx context.Context, in *CudaSharedMemoryUnregisterRequest, opts ...grpc.CallOption) (*CudaSharedMemoryUnregisterResponse, error)
}
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.
Hi @Phelan164, are you running the triton inference server inside a docker container? The default docker shm memory size might not be large enough for some models, but it can be increased by setting --shm-size=16g with the docker run command.
@kthui yes, I run triton inference server inside docker and could increase the shm-size but when I deploy more and more Python backend models, It reach to maximum of shm. So I want to check shm every time I deploy a model and undeploy some unused models if have to
Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue