Memory usage not going up with model instances

Open samipdahalr opened this issue 3 years ago • 1 comments

Hi,

I am using this backend for inference with GPT-J model (Codegen converted to GPT-J checkpoint to be precise). And I'm trying to load more than one model instances to process concurrent requests. However, with increasing no. of models, the GPU memory usage doesn't go up. The first model takes about 6GB of memory but all subsequent models only result in tiny fraction of that memory. Was wondering if this is a bug?

Here're the relevant details of the confix.pbtxt file:

instance_group [
  {
    count: 3
    kind : KIND_CPU
  }
]
parameters {
  key: "tensor_para_size"
  value: {
    string_value: "1"
  }
}
parameters {
  key: "pipeline_para_size"
  value: {
    string_value: "1"
  }
}
parameters {
  key: "data_type"
  value: {
    string_value: "fp16"
  }
}

Any help would be appreciated!

Oct 25 '22 18:10 samipdahalr

All instances will share same model weights. So, they only additional workspace for computing.

Oct 25 '22 23:10 byshiue