PaDiM-Anomaly-Detection-Localization-master Inference time

Thanks for your effort! I have a question about PaDiM. I saw the average inference time with R18-Rd100 took 0.23sec in the paper. But in test phase, calculating train/test image vector's mahalanobis distance takes about 9sec when I use gpu. Any comments??? Thanks!

Feb 02 '21 13:02 sangkyuleeKOR

Sorry, the implementation of Mahalanobis distance is not elegant and takes up most of the inference time, which may still have room for optimization.

Feb 03 '21 07:02 xiahaifeng1995

thanks for reply! I think this way is faster that Instead of calcuating vectors with for loop, get mahalanobis distance with matrix multiply!

Feb 04 '21 01:02 sangkyuleeKOR

Do you think that could be improved by multiprocessing or joblib packages?

Feb 15 '21 09:02 DeepKnowledge1

Do you mean

        for i in range(H * W):
        mean = train_outputs[0][:, i]
        conv_inv = np.linalg.inv(train_outputs[1][:, :, i])
        dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]
        dist_list.append(dist)

This part takes a lot of time, right?

Mar 25 '21 08:03 okokchoi

@xiahaifeng1995 , @okokchoi , you could also move the following into the training and save it with mean.

conv_inv = np.linalg.inv(train_outputs[1][:, :, i])

So, in the training part:

train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

I replace the following : dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]

with : import scipy.spatial.distance as SSD

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)

Mar 25 '21 08:03 DeepKnowledge1

Thanks a lot for your reply! I'm really sorry but, I think something wrong with the code which I modificate

            for i in range(H * W):
                mean = train_outputs[0][:, i]
                conv_inv = train_outputs[1][:, :, i]
                dist = cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
                dist_list.append(dist)

<Error>
Traceback (most recent call last):
  File "main_test.py", line 301, in <module>
    main()
  File "main_test.py", line 170, in main
    dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)
ValueError: axes don't match array

dist value has the same length, but something wrong with dist_list

Mar 25 '21 10:03 okokchoi

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Mar 25 '21 11:03 DeepKnowledge1

I solve the problem that I just load pkl file for the non-modified version. I have a question @DeepKnowledge1, is the modified version faster than the original one? (Anyway, Thank you for your favor :) You are the best!

Mar 25 '21 11:03 okokchoi

I think so, please try it and share your findings

Mar 25 '21 12:03 DeepKnowledge1

Ok I will 👍

Mar 25 '21 12:03 okokchoi

@DeepKnowledge1 @okokchoi

I think it's pretty much the same. As well as the size of the feature map, below codes are heavy

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))

Is there a way to turn it in parallel?

May 26 '21 08:05 ingbeeedd

Improved 3.5 times through real process multiprocessing

May 27 '21 06:05 ingbeeedd

Improved 3.5 times through real process multiprocessing

Awesome! Did you use the multiprocessing module in Pytorch?

May 27 '21 11:05 fryegg

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below

def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list

To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

May 28 '21 00:05 GreatScherzo

@GreatScherzo That's what I want to do to change the loop to matrix calculation. I will apply some modifications to this.

May 28 '21 00:05 fryegg

@fryegg @GreatScherzo I have written as follows.

manager = multiprocessing.Manager()
cpu_core = 8
dist_list = manager.list()
for number in range(cpu_core):
    dist_list.append(manager.list())

def calculate_distance(number, start, end, train_outputs, embedding_vectors):
    global dist_list
    for i in range(start, end):
        mean = train_outputs[0][:, i ]
        conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#
        dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
        dist = list(itertools.chain(*dist))
        dist_list[number].append(dist)

main function

procs = []
start = time.time()
for number in range(cpu_core):
    s = number * (H*W // cpu_core)
    e = (number + 1) * (H*W // cpu_core)
    proc = Process(target=calculate_distance, args=(number, s, e, train_outputs, embedding_vectors))
    procs.append(proc)
    proc.start()

for proc in procs:
    proc.join()

print("time :", time.time() - start)

global dist_list
final_list = []
for number in range(cpu_core):
    final_list.extend(dist_list[number])

final_list = np.array(final_list).transpose(1, 0).reshape(B, H, W)
final_list = torch.tensor(final_list)
score_map = F.interpolate(final_list.unsqueeze(1), size=x.size(2), mode='bilinear', align_corners=False).squeeze().numpy()

I'd appreciate it if you could give me your opinion.

May 28 '21 03:05 ingbeeedd

@ingbeeedd thank you very much for sharing your code! I haven't have time to test it out yet. But I'll sure share you the speed results after I tried it!

May 31 '21 03:05 GreatScherzo

@fryegg @GreatScherzo The GPU calculated Mahalnobis distance, and it's 24 times better than before. (cpu parallel processing 3.5 times) so, cpu parallelism has been improved by 6 times.

Jun 02 '21 04:06 ingbeeedd

@ingbeeedd Nice Work! How did you calculate Mahalanobis distance with GPU? Did you change 'embedding vector' into tensor?

Jun 03 '21 00:06 fryegg

@fryegg The code is being refreshed. I'll leave a comment as soon as it's organized.

Jun 09 '21 02:06 ingbeeedd

@GreatScherzo @fryegg @DeepKnowledge1 @okokchoi @xiahaifeng1995 @prob1995 @sangkyuleeKOR

https://github.com/ingbeeedd/PaDiM-EfficientNet I code up :)

Jul 15 '21 14:07 ingbeeedd

Hi @GreatScherzo ,

thanks for your improvement, it is faster but the score is different , the scores for the normal images are higher than the defective images, do you have any explanation?

Nov 20 '21 08:11 DeepKnowledge1

@okokchoi , Did you compute the conv_inv and save it? see, in the training part, and replace it with :
for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)
and in testing:
dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......
Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below
def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list
To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

@GreatScherzo Thanks for your code.

It works fine with only one image, but if you have a batch, the scores will be much different. i think the error is in the einsum function, which i have no idea how to fix it :)

Nov 23 '21 09:11 DeepKnowledge1

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot

Dec 29 '21 09:12 DeepKnowledge1

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot

ok, thanks

Dec 31 '21 01:12 leolv131

PaDiM-Anomaly-Detection-Localization-master PaDiM-Anomaly-Detection-Localization-master copied to clipboard

Inference time

PaDiM-Anomaly-Detection-Localization-master
PaDiM-Anomaly-Detection-Localization-master copied to clipboard