PatchCore_anomaly_detection icon indicating copy to clipboard operation
PatchCore_anomaly_detection copied to clipboard

During the test, the GPU usage is large.

Open leolv131 opened this issue 3 years ago • 9 comments

Does the GPU occupy a small amount during training and a large amount during testing? Does the distance matrix take up a large number of GPUs?The following functions:

def distance_matrix(x, y=None, p=2): # pairwise distance of vectors y = x if type(y) == type(None) else y n = x.size(0) m = y.size(0) d = x.size(1) x = x.unsqueeze(1).expand(n, m, d) y = y.unsqueeze(0).expand(n, m, d) dist = torch.pow(x - y, p).sum(2) return dist

leolv131 avatar Aug 23 '21 09:08 leolv131

hi, @leolv131 have you solved this problem?

XiaoPengZong avatar Sep 08 '21 06:09 XiaoPengZong

Is this problem solved? I've faced the same issue sometimes.

HoseinHashemi avatar Nov 03 '21 05:11 HoseinHashemi

torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large)

When I use torch.cdist(x, y, p), it needs small size of gpu memory.

So I use torch.cdist now. I want to get another reviews about using cdist.

Please notify the result of using cdist function.

paining avatar Nov 05 '21 07:11 paining

torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large)

When I use torch.cdist(x, y, p), it needs small size of gpu memory.

So I use torch.cdist now. I want to get another reviews about using cdist.

Please notify the result of using cdist function.

i use torch.cdist(x, y, p),but the size of gpu memory is larger

leolv131 avatar Nov 05 '21 08:11 leolv131

Is this problem solved? I've faced the same issue sometimes.

i change the coreset_sample_radio

leolv131 avatar Nov 05 '21 08:11 leolv131

torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.

i use torch.cdist(x, y, p),but the size of gpu memory is larger

When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01)

Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch)

Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes)

But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.

class KNN(NN):

  def __init__(self, X=None, Y=None, k=3, p=2):
      self.k = k
      super().__init__(X, Y, p)

  def train(self, X, Y):
      super().train(X, Y)
      if type(Y) != type(None):
          self.unique_labels = self.train_label.unique()

  def predict(self, x):


      # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p)
      dist = torch.cdist(x, self.train_pts, self.p)

      knn = dist.topk(self.k, largest=False)


      return knn

please try with this code again and give me your experience.

paining avatar Nov 08 '21 00:11 paining

torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.

i use torch.cdist(x, y, p),but the size of gpu memory is larger

When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01)

Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch)

Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes)

But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.

class KNN(NN):

  def __init__(self, X=None, Y=None, k=3, p=2):
      self.k = k
      super().__init__(X, Y, p)

  def train(self, X, Y):
      super().train(X, Y)
      if type(Y) != type(None):
          self.unique_labels = self.train_label.unique()

  def predict(self, x):


      # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p)
      dist = torch.cdist(x, self.train_pts, self.p)

      knn = dist.topk(self.k, largest=False)


      return knn

please try with this code again and give me your experience.

thank you,it is useful,now it needs 5G,before it needs 15G。 last time, i modifyed the code as follow,it is not work, why: def distance_matrix(x, y=None, p=2):

y = x if type(y) == type(None) else y

n = x.size(0)
m = y.size(0)
d = x.size(1)

x = x.unsqueeze(1).expand(n, m, d)
y = y.unsqueeze(0).expand(n, m, d)

# dist = torch.pow(x - y, p).sum(2)
dist = torch.cdist(x, y, p)

return dist

leolv131 avatar Nov 08 '21 11:11 leolv131

torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.

i use torch.cdist(x, y, p),but the size of gpu memory is larger

When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01) Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch) Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes) But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.

class KNN(NN):

  def __init__(self, X=None, Y=None, k=3, p=2):
      self.k = k
      super().__init__(X, Y, p)

  def train(self, X, Y):
      super().train(X, Y)
      if type(Y) != type(None):
          self.unique_labels = self.train_label.unique()

  def predict(self, x):


      # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p)
      dist = torch.cdist(x, self.train_pts, self.p)

      knn = dist.topk(self.k, largest=False)


      return knn

please try with this code again and give me your experience.

thank you,it is useful,now it needs 5G,before it needs 15G。 last time, i modifyed the code as follow,it is not work, why: def distance_matrix(x, y=None, p=2):

y = x if type(y) == type(None) else y

n = x.size(0)
m = y.size(0)
d = x.size(1)

x = x.unsqueeze(1).expand(n, m, d)
y = y.unsqueeze(0).expand(n, m, d)

# dist = torch.pow(x - y, p).sum(2)
dist = torch.cdist(x, y, p)

return dist

As I know, torch.cdist function needs inputs which has same column dimension and batch size. For example, x.shape = (batch_size, number_of_X, feature_dimension) and y.shape = (batch_size, number_of_Y, feature_dimension) So, if we use torch.cdist(x, self.train_pts, self.p), x.shape = (784, 1536) and self.train_pts.shape = (Memory_bank_size, 1536). And torch.cdist function is broadcastable, So it will be x ~ (1, 784, 1536) and y ~ (1, Memory_bank_size, 1536).

But your code manipulate x and y to (n,m,d) before using torch.cdist, it will fail.

My english skill is very low, so I miss inform you before. Sorry.... (^^);;;

paining avatar Nov 09 '21 02:11 paining

torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.

i use torch.cdist(x, y, p),but the size of gpu memory is larger

When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01) Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch) Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes) But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.

class KNN(NN):

  def __init__(self, X=None, Y=None, k=3, p=2):
      self.k = k
      super().__init__(X, Y, p)

  def train(self, X, Y):
      super().train(X, Y)
      if type(Y) != type(None):
          self.unique_labels = self.train_label.unique()

  def predict(self, x):


      # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p)
      dist = torch.cdist(x, self.train_pts, self.p)

      knn = dist.topk(self.k, largest=False)


      return knn

please try with this code again and give me your experience.

thank you,it is useful,now it needs 5G,before it needs 15G。 last time, i modifyed the code as follow,it is not work, why: def distance_matrix(x, y=None, p=2):

y = x if type(y) == type(None) else y

n = x.size(0)
m = y.size(0)
d = x.size(1)

x = x.unsqueeze(1).expand(n, m, d)
y = y.unsqueeze(0).expand(n, m, d)

# dist = torch.pow(x - y, p).sum(2)
dist = torch.cdist(x, y, p)

return dist

As I know, torch.cdist function needs inputs which has same column dimension and batch size. For example, x.shape = (batch_size, number_of_X, feature_dimension) and y.shape = (batch_size, number_of_Y, feature_dimension) So, if we use torch.cdist(x, self.train_pts, self.p), x.shape = (784, 1536) and self.train_pts.shape = (Memory_bank_size, 1536). And torch.cdist function is broadcastable, So it will be x ~ (1, 784, 1536) and y ~ (1, Memory_bank_size, 1536).

But your code manipulate x and y to (n,m,d) before using torch.cdist, it will fail.

My english skill is very low, so I miss inform you before. Sorry.... (^^);;;

thank you

leolv131 avatar Nov 09 '21 02:11 leolv131