PaDiM-Anomaly-Detection-Localization-master icon indicating copy to clipboard operation
PaDiM-Anomaly-Detection-Localization-master copied to clipboard

Inference time

Open sangkyuleeKOR opened this issue 4 years ago • 25 comments

Thanks for your effort! I have a question about PaDiM. I saw the average inference time with R18-Rd100 took 0.23sec in the paper. But in test phase, calculating train/test image vector's mahalanobis distance takes about 9sec when I use gpu. Any comments??? Thanks!

sangkyuleeKOR avatar Feb 02 '21 13:02 sangkyuleeKOR

Sorry, the implementation of Mahalanobis distance is not elegant and takes up most of the inference time, which may still have room for optimization.

xiahaifeng1995 avatar Feb 03 '21 07:02 xiahaifeng1995

thanks for reply! I think this way is faster that Instead of calcuating vectors with for loop, get mahalanobis distance with matrix multiply!

sangkyuleeKOR avatar Feb 04 '21 01:02 sangkyuleeKOR

Do you think that could be improved by multiprocessing or joblib packages?

DeepKnowledge1 avatar Feb 15 '21 09:02 DeepKnowledge1

Do you mean

        for i in range(H * W):
        mean = train_outputs[0][:, i]
        conv_inv = np.linalg.inv(train_outputs[1][:, :, i])
        dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]
        dist_list.append(dist)

This part takes a lot of time, right?

okokchoi avatar Mar 25 '21 08:03 okokchoi

@xiahaifeng1995 , @okokchoi , you could also move the following into the training and save it with mean.

conv_inv = np.linalg.inv(train_outputs[1][:, :, i])

So, in the training part:

train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

I replace the following : dist = [mahalanobis(sample[:, i], mean, conv_inv) for sample in embedding_vectors]

with : import scipy.spatial.distance as SSD

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)

DeepKnowledge1 avatar Mar 25 '21 08:03 DeepKnowledge1

Thanks a lot for your reply! I'm really sorry but, I think something wrong with the code which I modificate

            for i in range(H * W):
                mean = train_outputs[0][:, i]
                conv_inv = train_outputs[1][:, :, i]
                dist = cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
                dist_list.append(dist)
<Error>
Traceback (most recent call last):
  File "main_test.py", line 301, in <module>
    main()
  File "main_test.py", line 170, in main
    dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)
ValueError: axes don't match array

dist value has the same length, but something wrong with dist_list

okokchoi avatar Mar 25 '21 10:03 okokchoi

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

DeepKnowledge1 avatar Mar 25 '21 11:03 DeepKnowledge1

I solve the problem that I just load pkl file for the non-modified version. I have a question @DeepKnowledge1, is the modified version faster than the original one? (Anyway, Thank you for your favor :) You are the best!

okokchoi avatar Mar 25 '21 11:03 okokchoi

I think so, please try it and share your findings

DeepKnowledge1 avatar Mar 25 '21 12:03 DeepKnowledge1

Ok I will 👍

okokchoi avatar Mar 25 '21 12:03 okokchoi

@DeepKnowledge1 @okokchoi

I think it's pretty much the same. As well as the size of the feature map, below codes are heavy

dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
dist = list(itertools.chain(*dist))

Is there a way to turn it in parallel?

ingbeeedd avatar May 26 '21 08:05 ingbeeedd

Improved 3.5 times through real process multiprocessing

ingbeeedd avatar May 27 '21 06:05 ingbeeedd

Improved 3.5 times through real process multiprocessing

Awesome! Did you use the multiprocessing module in Pytorch?

fryegg avatar May 27 '21 11:05 fryegg

@okokchoi , Did you compute the conv_inv and save it?

see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below

def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list

To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

GreatScherzo avatar May 28 '21 00:05 GreatScherzo

@GreatScherzo That's what I want to do to change the loop to matrix calculation. I will apply some modifications to this.

fryegg avatar May 28 '21 00:05 fryegg

@fryegg @GreatScherzo I have written as follows.

manager = multiprocessing.Manager()
cpu_core = 8
dist_list = manager.list()
for number in range(cpu_core):
    dist_list.append(manager.list())

def calculate_distance(number, start, end, train_outputs, embedding_vectors):
    global dist_list
    for i in range(start, end):
        mean = train_outputs[0][:, i ]
        conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#
        dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
        dist = list(itertools.chain(*dist))
        dist_list[number].append(dist)

main function

procs = []
start = time.time()
for number in range(cpu_core):
    s = number * (H*W // cpu_core)
    e = (number + 1) * (H*W // cpu_core)
    proc = Process(target=calculate_distance, args=(number, s, e, train_outputs, embedding_vectors))
    procs.append(proc)
    proc.start()

for proc in procs:
    proc.join()

print("time :", time.time() - start)

global dist_list
final_list = []
for number in range(cpu_core):
    final_list.extend(dist_list[number])

final_list = np.array(final_list).transpose(1, 0).reshape(B, H, W)
final_list = torch.tensor(final_list)
score_map = F.interpolate(final_list.unsqueeze(1), size=x.size(2), mode='bilinear', align_corners=False).squeeze().numpy()

I'd appreciate it if you could give me your opinion.

ingbeeedd avatar May 28 '21 03:05 ingbeeedd

@ingbeeedd thank you very much for sharing your code! I haven't have time to test it out yet. But I'll sure share you the speed results after I tried it!

GreatScherzo avatar May 31 '21 03:05 GreatScherzo

@fryegg @GreatScherzo The GPU calculated Mahalnobis distance, and it's 24 times better than before. (cpu parallel processing 3.5 times) so, cpu parallelism has been improved by 6 times.

ingbeeedd avatar Jun 02 '21 04:06 ingbeeedd

@ingbeeedd Nice Work! How did you calculate Mahalanobis distance with GPU? Did you change 'embedding vector' into tensor?

fryegg avatar Jun 03 '21 00:06 fryegg

@fryegg The code is being refreshed. I'll leave a comment as soon as it's organized.

ingbeeedd avatar Jun 09 '21 02:06 ingbeeedd

@GreatScherzo @fryegg @DeepKnowledge1 @okokchoi @xiahaifeng1995 @prob1995 @sangkyuleeKOR

https://github.com/ingbeeedd/PaDiM-EfficientNet I code up :)

ingbeeedd avatar Jul 15 '21 14:07 ingbeeedd

Hi @GreatScherzo ,

thanks for your improvement, it is faster but the score is different , the scores for the normal images are higher than the defective images, do you have any explanation?

DeepKnowledge1 avatar Nov 20 '21 08:11 DeepKnowledge1

@okokchoi , Did you compute the conv_inv and save it? see, in the training part, and replace it with :

for i in range(H * W):
    cov[:, :, i] = np.cov(embedding_vectors[:, :, i].numpy(), rowvar=False) + 0.01 * I
    conv_inv[:, :, i] =  np.linalg.inv(cov[:, :, i])
# save learned distribution
train_outputs = [mean, conv_inv]
with open(train_feature_filepath, 'wb') as f:
    pickle.dump(train_outputs, f,protocol=pickle.HIGHEST_PROTOCOL)

and in testing:

dist_list = []    

for i in range(H * W):
    mean = train_outputs[0][:, i]
    conv_inv = train_outputs[1][:, :, i] #np.linalg.inv(train_outputs[1][:, :, i])#

    dist = SSD.cdist(embedding_vectors[:,:,i], mean[None, :], metric='mahalanobis', VI=conv_inv)
    dist = list(itertools.chain(*dist))
    dist_list.append(dist)

dist_list = np.array(dist_list).transpose(1, 0).reshape(B, H, W)

# upsample
continue the rest of the code .......

Thank you for the code @DeepKnowledge1 . I tried to your code and was able to improve my inference time from 80 secs to 43 secs!

I tried to use cython with the code, but it didn't improve by much (this may be due to SSD.cdist already implementing c language optimisation ). The bottleneck in this code is ssd.cdist, as it has several loops within it. I then tried eliminating the loops altogether with vectorization.

Based on the mahalanobis equation (which can be reference in scipy's page), I used einsum to multiply the 3d matrices which is the mean, inv_cov, and embedding vectors without any looping. I was able to reduce my infer time from 43 secs to 2 secs!

The code is as below

def calc_maha_dist_infer_vectorized(B, C, H, W, embedded_vector_model, embedding_vectors, dist_list):
    with tqdm(total=3, desc="Loading…", ascii=False, ncols=75) as pbar:
        # start = time.perf_counter()

        pbar.set_description("Extracting mean and cov from model...")
        pbar.refresh()
        mean = embedded_vector_model[0][:, :]
        mean_reshaped = np.reshape(mean, [1, C, H * W])
        pbar.update(1)

        # checkpoint1 = time.perf_counter()
        conv_inv = embedded_vector_model[1][:, :, :]  # np.linalg.inv(train_outputs[1][:, :, i])#
        pbar.update(1)

        pbar.set_description("Calculating Mahalanobis Distance...")
        pbar.refresh()
        delta = embedding_vectors - mean_reshaped
        dist_list = np.sqrt(np.einsum('njl,jkl,nkl->nl', delta, conv_inv, delta))
        pbar.update(1)
        # = np.sqrt(np.einsum('nj,jk,nk->n', delta, conv_inv, delta))

    return dist_list

To improve further, maybe real process multiprocessing such as mentioned by @ingbeeedd could be implemented? Love to hear your thoughts

By the way, I used this code for single-image inference, and not for multiple at a time, so the size for the matrices of the mean, inv_cov and embedding_vectors may be too large for a calculating mahalanobis at only one time. Some modifications may be needed to process the data by batches and calculating mahalanobis.

@GreatScherzo Thanks for your code.

It works fine with only one image, but if you have a batch, the scores will be much different. i think the error is in the einsum function, which i have no idea how to fix it :)

DeepKnowledge1 avatar Nov 23 '21 09:11 DeepKnowledge1

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot

DeepKnowledge1 avatar Dec 29 '21 09:12 DeepKnowledge1

By the way, i fixed that,

So now, the distance is vectorized, works if you have one or many images The inference time was improved a lot

ok, thanks

leolv131 avatar Dec 31 '21 01:12 leolv131