PatchCore_anomaly_detection
PatchCore_anomaly_detection copied to clipboard
During the test, the GPU usage is large.
Does the GPU occupy a small amount during training and a large amount during testing? Does the distance matrix take up a large number of GPUs?The following functions:
def distance_matrix(x, y=None, p=2): # pairwise distance of vectors y = x if type(y) == type(None) else y n = x.size(0) m = y.size(0) d = x.size(1) x = x.unsqueeze(1).expand(n, m, d) y = y.unsqueeze(0).expand(n, m, d) dist = torch.pow(x - y, p).sum(2) return dist
hi, @leolv131 have you solved this problem?
Is this problem solved? I've faced the same issue sometimes.
torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large)
When I use torch.cdist(x, y, p), it needs small size of gpu memory.
So I use torch.cdist now. I want to get another reviews about using cdist.
Please notify the result of using cdist function.
torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large)
When I use torch.cdist(x, y, p), it needs small size of gpu memory.
So I use torch.cdist now. I want to get another reviews about using cdist.
Please notify the result of using cdist function.
i use torch.cdist(x, y, p),but the size of gpu memory is larger
Is this problem solved? I've faced the same issue sometimes.
i change the coreset_sample_radio
torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.
i use torch.cdist(x, y, p),but the size of gpu memory is larger
When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01)
Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch)
Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes)
But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.
class KNN(NN):
def __init__(self, X=None, Y=None, k=3, p=2):
self.k = k
super().__init__(X, Y, p)
def train(self, X, Y):
super().train(X, Y)
if type(Y) != type(None):
self.unique_labels = self.train_label.unique()
def predict(self, x):
# dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p)
dist = torch.cdist(x, self.train_pts, self.p)
knn = dist.topk(self.k, largest=False)
return knn
please try with this code again and give me your experience.
torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.
i use torch.cdist(x, y, p),but the size of gpu memory is larger
When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01)
Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch)
Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes)
But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.
class KNN(NN): def __init__(self, X=None, Y=None, k=3, p=2): self.k = k super().__init__(X, Y, p) def train(self, X, Y): super().train(X, Y) if type(Y) != type(None): self.unique_labels = self.train_label.unique() def predict(self, x): # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p) dist = torch.cdist(x, self.train_pts, self.p) knn = dist.topk(self.k, largest=False) return knn
please try with this code again and give me your experience.
thank you,it is useful,now it needs 5G,before it needs 15G。 last time, i modifyed the code as follow,it is not work, why: def distance_matrix(x, y=None, p=2):
y = x if type(y) == type(None) else y
n = x.size(0)
m = y.size(0)
d = x.size(1)
x = x.unsqueeze(1).expand(n, m, d)
y = y.unsqueeze(0).expand(n, m, d)
# dist = torch.pow(x - y, p).sum(2)
dist = torch.cdist(x, y, p)
return dist
torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.
i use torch.cdist(x, y, p),but the size of gpu memory is larger
When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01) Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch) Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes) But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.
class KNN(NN): def __init__(self, X=None, Y=None, k=3, p=2): self.k = k super().__init__(X, Y, p) def train(self, X, Y): super().train(X, Y) if type(Y) != type(None): self.unique_labels = self.train_label.unique() def predict(self, x): # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p) dist = torch.cdist(x, self.train_pts, self.p) knn = dist.topk(self.k, largest=False) return knn
please try with this code again and give me your experience.
thank you,it is useful,now it needs 5G,before it needs 15G。 last time, i modifyed the code as follow,it is not work, why: def distance_matrix(x, y=None, p=2):
y = x if type(y) == type(None) else y n = x.size(0) m = y.size(0) d = x.size(1) x = x.unsqueeze(1).expand(n, m, d) y = y.unsqueeze(0).expand(n, m, d) # dist = torch.pow(x - y, p).sum(2) dist = torch.cdist(x, y, p) return dist
As I know, torch.cdist function needs inputs which has same column dimension and batch size.
For example, x.shape = (batch_size, number_of_X, feature_dimension) and y.shape = (batch_size, number_of_Y, feature_dimension)
So, if we use torch.cdist(x, self.train_pts, self.p)
, x.shape = (784, 1536) and self.train_pts.shape = (Memory_bank_size, 1536).
And torch.cdist function is broadcastable, So it will be x ~ (1, 784, 1536) and y ~ (1, Memory_bank_size, 1536).
But your code manipulate x and y to (n,m,d) before using torch.cdist, it will fail.
My english skill is very low, so I miss inform you before. Sorry.... (^^);;;
torch.pow() function and sum() function keep their own results so they needs too much gpu memory. (distance table is too large) When I use torch.cdist(x, y, p), it needs small size of gpu memory. So I use torch.cdist now. I want to get another reviews about using cdist. Please notify the result of using cdist function.
i use torch.cdist(x, y, p),but the size of gpu memory is larger
When I run with bottle class in MVTec-AD, I got CUDA OOM error. (parameters : load_size=224, input_size=224, coreset_sampling_ratio=0.01) Exception has occurred: RuntimeError CUDA out of memory. Tried to allocate 7.35 GiB (GPU 0; 10.00 GiB total capacity; 7.63 GiB already allocated; 184.00 KiB free; 7.64 GiB reserved in total by PyTorch) Because 1% of bottle class in MVTec-AD has 1638 features in memory bank, it needs distance table 1638x784x1536 with 4byte float. (= 7,890,075,648 Bytes) But When I use torch.cdist rather than distance_matrix, I can run with 2.4GB GPU memories.
class KNN(NN): def __init__(self, X=None, Y=None, k=3, p=2): self.k = k super().__init__(X, Y, p) def train(self, X, Y): super().train(X, Y) if type(Y) != type(None): self.unique_labels = self.train_label.unique() def predict(self, x): # dist = distance_matrix(x, self.train_pts, self.p) ** (1 / self.p) dist = torch.cdist(x, self.train_pts, self.p) knn = dist.topk(self.k, largest=False) return knn
please try with this code again and give me your experience.
thank you,it is useful,now it needs 5G,before it needs 15G。 last time, i modifyed the code as follow,it is not work, why: def distance_matrix(x, y=None, p=2):
y = x if type(y) == type(None) else y n = x.size(0) m = y.size(0) d = x.size(1) x = x.unsqueeze(1).expand(n, m, d) y = y.unsqueeze(0).expand(n, m, d) # dist = torch.pow(x - y, p).sum(2) dist = torch.cdist(x, y, p) return dist
As I know, torch.cdist function needs inputs which has same column dimension and batch size. For example, x.shape = (batch_size, number_of_X, feature_dimension) and y.shape = (batch_size, number_of_Y, feature_dimension) So, if we use
torch.cdist(x, self.train_pts, self.p)
, x.shape = (784, 1536) and self.train_pts.shape = (Memory_bank_size, 1536). And torch.cdist function is broadcastable, So it will be x ~ (1, 784, 1536) and y ~ (1, Memory_bank_size, 1536).But your code manipulate x and y to (n,m,d) before using torch.cdist, it will fail.
My english skill is very low, so I miss inform you before. Sorry.... (^^);;;
thank you