PaDiM-Anomaly-Detection-Localization-master
PaDiM-Anomaly-Detection-Localization-master copied to clipboard
Inference time
Thanks for your effort! I have a question about PaDiM. I saw the average inference time with R18-Rd100 took 0.23sec in the paper. But in test phase, calculating train/test image vector's mahalanobis distance takes about 9sec when I use gpu. Any comments??? Thanks!
Sorry, the implementation of Mahalanobis distance is not elegant and takes up most of the inference time, which may still have room for optimization.
thanks for reply! I think this way is faster that Instead of calcuating vectors with for loop, get mahalanobis distance with matrix multiply!
Do you think that could be improved by multiprocessing or joblib packages?
Do you mean
for i in range(H * W):
mean = train_outputs[0][:, i]
conv_inv = np.linalg.inv(train_outputs[1][:, :, i])
dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]
dist_list.append(dist)
This part takes a lot of time, right?
@xiahaifeng1995 , @okokchoi , you could also move the following into the training and save it with mean.
conv_inv = np.linalg.inv(train_outputs[1][:, :, i])
So, in the training part:
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)
I replace the following :
dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]
with :
import scipy.spatial.distance as SSD
dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
Thanks a lot for your reply! I'm really sorry but, I think something wrong with the code which I modificate
for i in range(H * W):
mean = train_outputs[0][:, i]
conv_inv = train_outputs[1][:, :, i]
dist = cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist_list.append(dist)
<Error>
Traceback (most recent call last):
File "main_test.py", line 301, in <module>
main()
File "main_test.py", line 170, in main
dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)
ValueError: axes don't match array
dist value has the same length, but something wrong with dist_list
@okokchoi , Did you compute the conv_inv and save it?
see, in the training part, and replace it with :
for i in range(H * W):
cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
conv_inv[:, :, i] = np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)
and in testing:
dist_list = []
for i in range(H * W):
mean = train_outputs[0][:, i]
conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#
dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))
dist_list.append(dist)
dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)
# upsample
continue the rest of the code .......
I solve the problem that I just load pkl file for the non-modified version. I have a question @DeepKnowledge1, is the modified version faster than the original one? (Anyway, Thank you for your favor :) You are the best!
I think so, please try it and share your findings
Ok I will 👍
@DeepKnowledge1 @okokchoi
I think it's pretty much the same. As well as the size of the feature map, below codes are heavy
dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))
Is there a way to turn it in parallel?
Improved 3.5 times through real process multiprocessing
Improved 3.5 times through real process multiprocessing
Awesome! Did you use the multiprocessing module in Pytorch?
@okokchoi , Did you compute the conv_inv and save it?
see, in the training part, and replace it with :
for i in range(H * W): cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I conv_inv[:, :, i] = np.linalg.inv(cov[:, :, i]) # save learned distribution train_outputs = [mean, conv_inv] with open(train_feature_filepath, 'wb') as f: pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)
and in testing:
dist_list = [] for i in range(H * W): mean = train_outputs[0][:, i] conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])# dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv) dist = list(itertools.chain(*dist)) dist_list.append(dist) dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W) # upsample continue the rest of the code .......
Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!
I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.
Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!
The code is as below
def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
# start = time.perf_counter()
pbar.set_description("Extracting mean and cov from model...")
pbar.refresh()
mean = embedded_vector_model[0][:, :]
mean_reshaped = np.reshape(mean, [1, C, H * W])
pbar.update(1)
# checkpoint1 = time.perf_counter()
conv_inv = embedded_vector_model[1][:, :, :] # np.linalg.inv(train_outputs[1][:, :, i])#
pbar.update(1)
pbar.set_description("Calculating Mahalanobis Distance...")
pbar.refresh()
delta = embedding_vectors - mean_reshaped
dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
pbar.update(1)
# = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))
return dist_list
To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts
By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.
@GreatScherzo That's what I want to do to change the loop to matrix calculation. I will apply some modifications to this.
@fryegg @GreatScherzo I have written as follows.
manager = multiprocessing.Manager()
cpu_core = 8
dist_list = manager.list()
for number in range(cpu_core):
dist_list.append(manager.list())
def calculate_distance(number, start, end, train_outputs, embedding_vectors):
global dist_list
for i in range(start, end):
mean = train_outputs[0][:, i ]
conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#
dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))
dist_list[number].append(dist)
main function
procs = []
start = time.time()
for number in range(cpu_core):
s = number * (H*W // cpu_core)
e = (number + 1) * (H*W // cpu_core)
proc = Process(target=calculate_distance, args=(number, s, e, train_outputs, embedding_vectors))
procs.append(proc)
proc.start()
for proc in procs:
proc.join()
print("time :", time.time() - start)
global dist_list
final_list = []
for number in range(cpu_core):
final_list.extend(dist_list[number])
final_list = np.array(final_list).transpose(1, 0).reshape(B, H, W)
final_list = torch.tensor(final_list)
score_map = F.interpolate(final_list.unsqueeze(1), size=x.size(2), mode='bilinear', align_corners=False).squeeze().numpy()
I'd appreciate it if you could give me your opinion.
@ingbeeedd thank you very much for sharing your code! I haven't have time to test it out yet. But I'll sure share you the speed results after I tried it!
@fryegg @GreatScherzo The GPU calculated Mahalnobis distance, and it's 24 times better than before. (cpu parallel processing 3.5 times) so, cpu parallelism has been improved by 6 times.
@ingbeeedd Nice Work! How did you calculate Mahalanobis distance with GPU? Did you change 'embedding vector' into tensor?
@fryegg The code is being refreshed. I'll leave a comment as soon as it's organized.
@GreatScherzo @fryegg @DeepKnowledge1 @okokchoi @xiahaifeng1995 @prob1995 @sangkyuleeKOR
https://github.com/ingbeeedd/PaDiM-EfficientNet I code up :)
Hi @GreatScherzo ,
thanks for your improvement, it is faster but the score is different , the scores for the normal images are higher than the defective images, do you have any explanation?
@okokchoi , Did you compute the conv_inv and save it? see, in the training part, and replace it with :
for i in range(H * W): cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I conv_inv[:, :, i] = np.linalg.inv(cov[:, :, i]) # save learned distribution train_outputs = [mean, conv_inv] with open(train_feature_filepath, 'wb') as f: pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)
and in testing:
dist_list = [] for i in range(H * W): mean = train_outputs[0][:, i] conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])# dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv) dist = list(itertools.chain(*dist)) dist_list.append(dist) dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W) # upsample continue the rest of the code .......
Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!
I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.
Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!
The code is as below
def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list): with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar: # start = time.perf_counter() pbar.set_description("Extracting mean and cov from model...") pbar.refresh() mean = embedded_vector_model[0][:, :] mean_reshaped = np.reshape(mean, [1, C, H * W]) pbar.update(1) # checkpoint1 = time.perf_counter() conv_inv = embedded_vector_model[1][:, :, :] # np.linalg.inv(train_outputs[1][:, :, i])# pbar.update(1) pbar.set_description("Calculating Mahalanobis Distance...") pbar.refresh() delta = embedding_vectors - mean_reshaped dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta)) pbar.update(1) # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta)) return dist_list
To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts
By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.
@GreatScherzo Thanks for your code.
It works fine with only one image, but if you have a batch, the scores will be much different. i think the error is in the einsum function, which i have no idea how to fix it :)
By the way, i fixed that,
So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot
By the way, i fixed that,
So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot
ok, thanks