PaDiM-Anomaly-Detection-Localization-master icon indicating copy to clipboard operation
PaDiM-Anomaly-Detection-Localization-master copied to clipboard

while embedding_concat() runs, I got "Killed"

Open wonchul-kim opened this issue 3 years ago • 5 comments

I think I got too much number of data....?

what do you think?

wonchul-kim avatar Aug 19 '21 02:08 wonchul-kim

I think I got too much number of data....?

what do you think?

Try cropsize=224

haobo827 avatar Apr 20 '22 07:04 haobo827

If it is still relevant, this might be if you run out of RAM. Happened for me when having large datasets.

michaelstuffer98 avatar Jun 28 '23 10:06 michaelstuffer98

If I use this model in a larger dataset, out of RAM is a big issue. Can anybody solve it? I guess this is the natural shortcoming of this method.

RichardChangCA avatar Sep 28 '23 04:09 RichardChangCA

@RichardChangCA what I did was repeatedly running the embedding_concat() after a fixed amount of samples have been loaded, such that you can perform the index-selection during loading the data which reduces memory usage. Wrote some sample function that you have to call after e.g. every 1000 samples have been processed. Define that inside the main such that it can access the idx tensor and the embeddings=[] list that you have to declare yourself in the main function. Reinitialize empty the train_outputs/test_outputs dict after running store embeddings to free the memory.

def store_embeddings(model_outputs):
            for k, v in model_outputs.items():
                model_outputs[k] = torch.cat(v, 0)
            embedding_vectors = model_outputs[0]
            for layer_name in layers[1:]:
                    embedding_vectors = embedding_concat(embedding_vectors, model_outputs[layer_name])
            embedding_vectors = torch.index_select(embedding_vectors, 1, idx)
            embeddings.append(embedding_vectors)

michaelstuffer98 avatar Sep 28 '23 06:09 michaelstuffer98

@RichardChangCA what I did was repeatedly running the embedding_concat() after a fixed amount of samples have been loaded, such that you can perform the index-selection during loading the data which reduces memory usage. Wrote some sample function that you have to call after e.g. every 1000 samples have been processed. Define that inside the main such that it can access the idx tensor and the embeddings=[] list that you have to declare yourself in the main function. Reinitialize empty the train_outputs/test_outputs dict after running store embeddings to free the memory.

def store_embeddings(model_outputs):
            for k, v in model_outputs.items():
                model_outputs[k] = torch.cat(v, 0)
            embedding_vectors = model_outputs[0]
            for layer_name in layers[1:]:
                    embedding_vectors = embedding_concat(embedding_vectors, model_outputs[layer_name])
            embedding_vectors = torch.index_select(embedding_vectors, 1, idx)
            embeddings.append(embedding_vectors)

Thanks @michaelstuffer98 Do you know how to calculate the covariance matrix if the dataset is too large? I optimized the cpu usage for other parts, only left the covariance matrix calculation. I have to store all embeddings for all normal training data to calculate the covariance matrix, but my cpu cannot stand it.

RichardChangCA avatar Sep 29 '23 01:09 RichardChangCA