serve
serve copied to clipboard
High CPU memory usage
🐛 Describe the bug
Why does Torchserve uses CPU memory ? while loading model into GPU
I loaded a yolov5 small model and set workers to 1 my GPU VRAM usage was 1100 mb I could fit 12 workers easily, so I increased the worker to 12. I was watching memory usage it went till 12 gb and then the workers keep dying GPU had more than 3 gb free memory
then I checked the CPU / memory usage via htop, then I noticed that the CPU memory usage was huge 3.56 GB for one worker !! This was also scaling linearly 2 worker took 6GB and 3 worker took 8.44 GB. this makes sense since my machine has around 29 gb of memory as soon as I scaled beyond 10 worker started to die even though GPU memory was available
why do torch serve need so much CPU memory when the model is already loaded in GPU memory ? what does it do with it ? This resource usage looks unreasonable, is it normal or am I missing something ?
Error logs
there was no error such the issue is more in perspective of high resource usage
Installation instructions
installed rest of the pacakges manually installed jdk seperatly (sudo apt-get install -y openjdk-17-jdk
)
then installed torchserve via pip
Model Packaing
user torch-model-archiver to create .mar package
config.properties
inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 load_models=all default_response_timeout=120 unregister_model_timeout=120 cors_allowed_origin=* cors_allowed_methods=GET,POST,PUT,OPTIONS cors_allowed_headers=* max_request_size=10485760 enable_envvars_config=true
Versions
------------------------------------------------------------------------------------------
Torchserve branch:
torchserve==0.7.1
torch-model-archiver==0.7.1
Python version: 3.9 (64-bit runtime)
Python executable: /home/carscanai1/miniconda3/envs/sdk-11.7/bin/python
Versions of relevant python libraries:
numpy==1.24.3
nvgpu==0.10.0
psutil==5.9.5
pytest==7.3.1
pytest-cov==4.0.0
pytest-dependency==0.5.1
pytest-randomly==3.12.0
requests==2.29.0
torch==1.13.1+cu117
torch-model-archiver==0.7.1
torchserve==0.7.1
torchvision==0.14.1+cu117
wheel==0.38.4
torch==1.13.1+cu117
**Warning: torchtext not present ..
torchvision==0.14.1+cu117
**Warning: torchaudio not present ..
Java Version:
OS: Ubuntu 18.04.6 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: N/A
CMake version: N/A
Is CUDA available: Yes
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: Tesla T4
Nvidia driver version: 515.43.04
cuDNN version: None
Repro instructions
the handler file was simple just used torch.hub.load() to initiate the model
then torch-model-archiver to archive the model
torchserve --start --ncs --model-store ./model_store/ --ts-config ./config.properties --foreground
everything works perfectly fine in workers no error my question is in concern with high CPU memory usage
Possible Solution
No response
@PushpakBhoge could you share your handler code (ie. initialize func for model loading)?
I faced same problem. I have 10 instances with various models run on torchserve, on cpu and cuda. Server eats about 24Gb RAM. Models on cuda use a lot of CPU RAM (3.5Gb+). As i understand its a CUDA issue, not torchserve or torch. https://discuss.pytorch.org/t/cpu-ram-usage-with-cuda-is-large-2-gb/117668
You can run this script with various devices.
from typing import List
import torch
import numpy as np
import psutil
DEVICE = "cpu"
BATCH_SIZE = 1
NUM_SAMPLES = 100
class Model(torch.nn.Module):
def __init__(self) -> None:
super().__init__()
self.layers = torch.nn.Sequential(
torch.nn.Conv2d(3, 20, 3),
torch.nn.Conv2d(20, 3, 3)
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.layers(x)
def load_model(device: str) -> torch.nn.Module:
model = Model()
model.to(device)
return model
def make_input(n: int) -> torch.Tensor:
return torch.rand((n, 3, 640, 640))
@torch.no_grad()
def inference(model: torch.nn.Module, device: str, images: torch.Tensor, batch_size: int) -> None:
number_of_batches = int(np.ceil(len(images) / batch_size))
for i in range(number_of_batches):
selected_images = images[i * batch_size: (i + 1) * batch_size]
model(selected_images.to(device))
def print_memory(title: str) -> int:
div = 1024 * 1024 * 1024
print(f"=========== {title} ===========")
print(f"TOTAL MEMORY {psutil.virtual_memory().total / div :.2f} Gb")
print(f"AVAILABLE {psutil.virtual_memory().available / div:.2f} Gb")
print(f"USED {psutil.virtual_memory().used / div:.2f} Gb")
print()
return psutil.virtual_memory().used
def main() -> None:
start_memory = print_memory("START")
model = load_model(DEVICE)
images = make_input(NUM_SAMPLES)
inference(model, DEVICE, images, BATCH_SIZE)
end_memory = print_memory("END")
print(f"Memory reserved for script {abs(end_memory - start_memory) / (1024 * 1024 * 1024):.2f} Gb")
if __name__ == "__main__":
main()
Set DEVICE = "cpu"
:
=========== START =========== TOTAL MEMORY 31.23 Gb AVAILABLE 25.16 Gb USED 5.47 Gb
=========== END =========== TOTAL MEMORY 31.23 Gb AVAILABLE 24.63 Gb USED 6.00 Gb
Memory reserved for script 0.53 Gb
Set DEVICE = "cuda"
:
=========== START =========== TOTAL MEMORY 31.23 Gb AVAILABLE 25.16 Gb USED 5.48 Gb
=========== END =========== TOTAL MEMORY 31.23 Gb AVAILABLE 21.46 Gb USED 9.17 Gb
Memory reserved for script 3.70 Gb
@lxning sorry missed your message will send minimum reporducable code in next 24 hours
@vilka-lab might be correct though since the setup I had was torch 1.10.1 + cu11.3
I upgraded it to 1.13.1 + cu11.7 The issue contained a bit. It isn't solved but usage is reduced from 3-3.5 gb per worker to 2-2.5 GB
@vilka-lab kudos for the great written code! 😃
I have the same problem when serving with 5 models. This is my current hardware: CPU: I5-10400F GPU: RTX 3060 12GB VRAM RAM: 8 GB
I tested with 2 models, it eats about ~7GB / 8GB RAM. Is there something I can improve?
@hungtrieu07 no it seems like there is nothing we can do after some consideration, it seems like it's just a CUDA thing, if you do the same without Cuda it does not consume that much memory