server icon indicating copy to clipboard operation
server copied to clipboard

Get shm memory status

Open Phelan164 opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. When deploy many Python backend models, triton inference server run out of shm memory

Describe the solution you'd like I want to get the status shm memory to check whether need to undeploy unused models I checked the gRPC API, there is no API related to it. Is there any way to know shm memory status

type GRPCInferenceServiceClient interface {
	//@@  .. cpp:var:: rpc ServerLive(ServerLiveRequest) returns
	//@@       (ServerLiveResponse)
	//@@
	//@@     Check liveness of the inference server.
	//@@
	ServerLive(ctx context.Context, in *ServerLiveRequest, opts ...grpc.CallOption) (*ServerLiveResponse, error)
	//@@  .. cpp:var:: rpc ServerReady(ServerReadyRequest) returns
	//@@       (ServerReadyResponse)
	//@@
	//@@     Check readiness of the inference server.
	//@@
	ServerReady(ctx context.Context, in *ServerReadyRequest, opts ...grpc.CallOption) (*ServerReadyResponse, error)
	//@@  .. cpp:var:: rpc ModelReady(ModelReadyRequest) returns
	//@@       (ModelReadyResponse)
	//@@
	//@@     Check readiness of a model in the inference server.
	//@@
	ModelReady(ctx context.Context, in *ModelReadyRequest, opts ...grpc.CallOption) (*ModelReadyResponse, error)
	//@@  .. cpp:var:: rpc ServerMetadata(ServerMetadataRequest) returns
	//@@       (ServerMetadataResponse)
	//@@
	//@@     Get server metadata.
	//@@
	ServerMetadata(ctx context.Context, in *ServerMetadataRequest, opts ...grpc.CallOption) (*ServerMetadataResponse, error)
	//@@  .. cpp:var:: rpc ModelMetadata(ModelMetadataRequest) returns
	//@@       (ModelMetadataResponse)
	//@@
	//@@     Get model metadata.
	//@@
	ModelMetadata(ctx context.Context, in *ModelMetadataRequest, opts ...grpc.CallOption) (*ModelMetadataResponse, error)
	//@@  .. cpp:var:: rpc ModelInfer(ModelInferRequest) returns
	//@@       (ModelInferResponse)
	//@@
	//@@     Perform inference using a specific model.
	//@@
	ModelInfer(ctx context.Context, in *ModelInferRequest, opts ...grpc.CallOption) (*ModelInferResponse, error)
	//@@  .. cpp:var:: rpc ModelStreamInfer(stream ModelInferRequest) returns
	//@@       (stream ModelStreamInferResponse)
	//@@
	//@@     Perform streaming inference.
	//@@
	ModelStreamInfer(ctx context.Context, opts ...grpc.CallOption) (GRPCInferenceService_ModelStreamInferClient, error)
	//@@  .. cpp:var:: rpc ModelConfig(ModelConfigRequest) returns
	//@@       (ModelConfigResponse)
	//@@
	//@@     Get model configuration.
	//@@
	ModelConfig(ctx context.Context, in *ModelConfigRequest, opts ...grpc.CallOption) (*ModelConfigResponse, error)
	//@@  .. cpp:var:: rpc ModelStatistics(
	//@@                     ModelStatisticsRequest)
	//@@                   returns (ModelStatisticsResponse)
	//@@
	//@@     Get the cumulative inference statistics for a model.
	//@@
	ModelStatistics(ctx context.Context, in *ModelStatisticsRequest, opts ...grpc.CallOption) (*ModelStatisticsResponse, error)
	//@@  .. cpp:var:: rpc RepositoryIndex(RepositoryIndexRequest) returns
	//@@       (RepositoryIndexResponse)
	//@@
	//@@     Get the index of model repository contents.
	//@@
	RepositoryIndex(ctx context.Context, in *RepositoryIndexRequest, opts ...grpc.CallOption) (*RepositoryIndexResponse, error)
	//@@  .. cpp:var:: rpc RepositoryModelLoad(RepositoryModelLoadRequest) returns
	//@@       (RepositoryModelLoadResponse)
	//@@
	//@@     Load or reload a model from a repository.
	//@@
	RepositoryModelLoad(ctx context.Context, in *RepositoryModelLoadRequest, opts ...grpc.CallOption) (*RepositoryModelLoadResponse, error)
	//@@  .. cpp:var:: rpc RepositoryModelUnload(RepositoryModelUnloadRequest)
	//@@       returns (RepositoryModelUnloadResponse)
	//@@
	//@@     Unload a model.
	//@@
	RepositoryModelUnload(ctx context.Context, in *RepositoryModelUnloadRequest, opts ...grpc.CallOption) (*RepositoryModelUnloadResponse, error)
	//@@  .. cpp:var:: rpc SystemSharedMemoryStatus(
	//@@                     SystemSharedMemoryStatusRequest)
	//@@                   returns (SystemSharedMemoryStatusRespose)
	//@@
	//@@     Get the status of all registered system-shared-memory regions.
	//@@
	SystemSharedMemoryStatus(ctx context.Context, in *SystemSharedMemoryStatusRequest, opts ...grpc.CallOption) (*SystemSharedMemoryStatusResponse, error)
	//@@  .. cpp:var:: rpc SystemSharedMemoryRegister(
	//@@                     SystemSharedMemoryRegisterRequest)
	//@@                   returns (SystemSharedMemoryRegisterResponse)
	//@@
	//@@     Register a system-shared-memory region.
	//@@
	SystemSharedMemoryRegister(ctx context.Context, in *SystemSharedMemoryRegisterRequest, opts ...grpc.CallOption) (*SystemSharedMemoryRegisterResponse, error)
	//@@  .. cpp:var:: rpc SystemSharedMemoryUnregister(
	//@@                     SystemSharedMemoryUnregisterRequest)
	//@@                   returns (SystemSharedMemoryUnregisterResponse)
	//@@
	//@@     Unregister a system-shared-memory region.
	//@@
	SystemSharedMemoryUnregister(ctx context.Context, in *SystemSharedMemoryUnregisterRequest, opts ...grpc.CallOption) (*SystemSharedMemoryUnregisterResponse, error)
	//@@  .. cpp:var:: rpc CudaSharedMemoryStatus(
	//@@                     CudaSharedMemoryStatusRequest)
	//@@                   returns (CudaSharedMemoryStatusRespose)
	//@@
	//@@     Get the status of all registered CUDA-shared-memory regions.
	//@@
	CudaSharedMemoryStatus(ctx context.Context, in *CudaSharedMemoryStatusRequest, opts ...grpc.CallOption) (*CudaSharedMemoryStatusResponse, error)
	//@@  .. cpp:var:: rpc CudaSharedMemoryRegister(
	//@@                     CudaSharedMemoryRegisterRequest)
	//@@                   returns (CudaSharedMemoryRegisterResponse)
	//@@
	//@@     Register a CUDA-shared-memory region.
	//@@
	CudaSharedMemoryRegister(ctx context.Context, in *CudaSharedMemoryRegisterRequest, opts ...grpc.CallOption) (*CudaSharedMemoryRegisterResponse, error)
	//@@  .. cpp:var:: rpc CudaSharedMemoryUnregister(
	//@@                     CudaSharedMemoryUnregisterRequest)
	//@@                   returns (CudaSharedMemoryUnregisterResponse)
	//@@
	//@@     Unregister a CUDA-shared-memory region.
	//@@
	CudaSharedMemoryUnregister(ctx context.Context, in *CudaSharedMemoryUnregisterRequest, opts ...grpc.CallOption) (*CudaSharedMemoryUnregisterResponse, error)
}

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

Phelan164 avatar Sep 15 '22 04:09 Phelan164

Hi @Phelan164, are you running the triton inference server inside a docker container? The default docker shm memory size might not be large enough for some models, but it can be increased by setting --shm-size=16g with the docker run command.

kthui avatar Sep 19 '22 22:09 kthui

@kthui yes, I run triton inference server inside docker and could increase the shm-size but when I deploy more and more Python backend models, It reach to maximum of shm. So I want to check shm every time I deploy a model and undeploy some unused models if have to

Phelan164 avatar Sep 20 '22 16:09 Phelan164

Closing issue due to lack of activity. Please re-open the issue if you would like to follow up with this issue

jbkyang-nvi avatar Nov 22 '22 03:11 jbkyang-nvi