serve icon indicating copy to clipboard operation
serve copied to clipboard

High CPU memory usage

Open PushpakBhoge opened this issue 1 year ago • 5 comments

🐛 Describe the bug

Why does Torchserve uses CPU memory ? while loading model into GPU

I loaded a yolov5 small model and set workers to 1 my GPU VRAM usage was 1100 mb I could fit 12 workers easily, so I increased the worker to 12. I was watching memory usage it went till 12 gb and then the workers keep dying GPU had more than 3 gb free memory

then I checked the CPU / memory usage via htop, then I noticed that the CPU memory usage was huge 3.56 GB for one worker !! This was also scaling linearly 2 worker took 6GB and 3 worker took 8.44 GB. this makes sense since my machine has around 29 gb of memory as soon as I scaled beyond 10 worker started to die even though GPU memory was available

why do torch serve need so much CPU memory when the model is already loaded in GPU memory ? what does it do with it ? This resource usage looks unreasonable, is it normal or am I missing something ?

Error logs

there was no error such the issue is more in perspective of high resource usage

Installation instructions

installed rest of the pacakges manually installed jdk seperatly (sudo apt-get install -y openjdk-17-jdk) then installed torchserve via pip

Model Packaing

user torch-model-archiver to create .mar package

config.properties

inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 load_models=all default_response_timeout=120 unregister_model_timeout=120 cors_allowed_origin=* cors_allowed_methods=GET,POST,PUT,OPTIONS cors_allowed_headers=* max_request_size=10485760 enable_envvars_config=true

Versions

------------------------------------------------------------------------------------------
Torchserve branch: 

torchserve==0.7.1
torch-model-archiver==0.7.1

Python version: 3.9 (64-bit runtime)
Python executable: /home/carscanai1/miniconda3/envs/sdk-11.7/bin/python

Versions of relevant python libraries:
numpy==1.24.3
nvgpu==0.10.0
psutil==5.9.5
pytest==7.3.1
pytest-cov==4.0.0
pytest-dependency==0.5.1
pytest-randomly==3.12.0
requests==2.29.0
torch==1.13.1+cu117
torch-model-archiver==0.7.1
torchserve==0.7.1
torchvision==0.14.1+cu117
wheel==0.38.4
torch==1.13.1+cu117
**Warning: torchtext not present ..
torchvision==0.14.1+cu117
**Warning: torchaudio not present ..

Java Version:


OS: Ubuntu 18.04.6 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: N/A
CMake version: N/A

Is CUDA available: Yes
CUDA runtime version: 11.7.64
GPU models and configuration: 
GPU 0: Tesla T4
Nvidia driver version: 515.43.04
cuDNN version: None

Repro instructions

the handler file was simple just used torch.hub.load() to initiate the model

then torch-model-archiver to archive the model

torchserve --start --ncs --model-store ./model_store/ --ts-config ./config.properties  --foreground

everything works perfectly fine in workers no error my question is in concern with high CPU memory usage

Possible Solution

No response

PushpakBhoge avatar Apr 28 '23 15:04 PushpakBhoge

@PushpakBhoge could you share your handler code (ie. initialize func for model loading)?

lxning avatar May 08 '23 21:05 lxning

I faced same problem. I have 10 instances with various models run on torchserve, on cpu and cuda. Server eats about 24Gb RAM. Models on cuda use a lot of CPU RAM (3.5Gb+). As i understand its a CUDA issue, not torchserve or torch. https://discuss.pytorch.org/t/cpu-ram-usage-with-cuda-is-large-2-gb/117668

You can run this script with various devices.

from typing import List

import torch
import numpy as np
import psutil

DEVICE = "cpu"
BATCH_SIZE = 1
NUM_SAMPLES = 100


class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Conv2d(3, 20, 3),
            torch.nn.Conv2d(20, 3, 3)
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.layers(x)


def load_model(device: str) -> torch.nn.Module:
    model = Model()
    model.to(device)
    return model

def make_input(n: int) -> torch.Tensor:
    return torch.rand((n, 3, 640, 640))

@torch.no_grad()
def inference(model: torch.nn.Module, device: str, images: torch.Tensor, batch_size: int) -> None:
    number_of_batches = int(np.ceil(len(images) / batch_size))
    
    for i in range(number_of_batches):
        selected_images = images[i * batch_size: (i + 1) * batch_size]
        model(selected_images.to(device))
    
def print_memory(title: str) -> int:
    div = 1024 * 1024 * 1024
    print(f"=========== {title} ===========")
    print(f"TOTAL MEMORY {psutil.virtual_memory().total / div :.2f} Gb")
    print(f"AVAILABLE {psutil.virtual_memory().available / div:.2f} Gb")
    print(f"USED {psutil.virtual_memory().used / div:.2f} Gb")
    print()
    return psutil.virtual_memory().used

def main() -> None:
    start_memory = print_memory("START")
    model = load_model(DEVICE)
    images = make_input(NUM_SAMPLES)
    inference(model, DEVICE, images, BATCH_SIZE)
    end_memory = print_memory("END")

    print(f"Memory reserved for script {abs(end_memory - start_memory) / (1024 * 1024 * 1024):.2f} Gb")

if __name__ == "__main__":
    main()

Set DEVICE = "cpu":

=========== START =========== TOTAL MEMORY 31.23 Gb AVAILABLE 25.16 Gb USED 5.47 Gb

=========== END =========== TOTAL MEMORY 31.23 Gb AVAILABLE 24.63 Gb USED 6.00 Gb

Memory reserved for script 0.53 Gb

Set DEVICE = "cuda":

=========== START =========== TOTAL MEMORY 31.23 Gb AVAILABLE 25.16 Gb USED 5.48 Gb

=========== END =========== TOTAL MEMORY 31.23 Gb AVAILABLE 21.46 Gb USED 9.17 Gb

Memory reserved for script 3.70 Gb

vilka-lab avatar May 22 '23 14:05 vilka-lab

@lxning sorry missed your message will send minimum reporducable code in next 24 hours

@vilka-lab might be correct though since the setup I had was torch 1.10.1 + cu11.3

I upgraded it to 1.13.1 + cu11.7 The issue contained a bit. It isn't solved but usage is reduced from 3-3.5 gb per worker to 2-2.5 GB

@vilka-lab kudos for the great written code! 😃

PushpakBhoge512 avatar May 22 '23 16:05 PushpakBhoge512

I have the same problem when serving with 5 models. This is my current hardware: CPU: I5-10400F GPU: RTX 3060 12GB VRAM RAM: 8 GB

I tested with 2 models, it eats about ~7GB / 8GB RAM. Is there something I can improve?

hungtrieu07 avatar Jan 09 '24 13:01 hungtrieu07

@hungtrieu07 no it seems like there is nothing we can do after some consideration, it seems like it's just a CUDA thing, if you do the same without Cuda it does not consume that much memory

PushpakBhoge512 avatar Jan 09 '24 13:01 PushpakBhoge512