Reconstruction-by-inpainting-for-visual-anomaly-detection
Reconstruction-by-inpainting-for-visual-anomaly-detection copied to clipboard
NaN after few steps loss
Hi, thanks for posting these codes.
I'm trying to replicate the results, but I'm getting NaN
after 11 steps.
I installed all the dependencies in the described versions, but I still have these results.
Please find below the log results
$ python train.py --obj zipper --data_path ./data/mvtech_anomaly --batch_size 2
{'alpha': 1.0, 'batch_size': 2, 'belta': 1.0, 'data_path': ' ./data/mvtech_anomaly', 'data_type': 'mvtec', 'epochs': 300, 'gamma': 1.0, 'grayscale': False, 'img_size': 256, 'input_channel': 3, 'k_value': [2, 4, 8, 16], 'lr': 0.0001, 'obj': 'zipper', 'prefix': '2020-12-03-1197', 'save_dir': './mvtec/zipper/seed_2988/', 'seed': 2988, 'validation_ratio': 0.2, 'weight_decay': 1e-05}
1/300 ----- [[2020-12-03 23:30:45]] [Need: 00:00:00]
0%| | 0/96 [00:00<?, ?it/s]Step Loss: 1.779465
1%|█ | 1/96 [00:02<03:22, 2.13s/it]Step Loss: 1.835103
2%|██ | 2/96 [00:03<02:52, 1.83s/it]Step Loss: 1.479402
3%|███ | 3/96 [00:04<02:36, 1.69s/it]Step Loss: 1.401773
4%|████ | 4/96 [00:05<02:26, 1.59s/it]Step Loss: 1.448756
5%|█████ | 5/96 [00:07<02:13, 1.46s/it]Step Loss: 1.693701
6%|██████ | 6/96 [00:08<02:02, 1.36s/it]Step Loss: 1.229446
7%|███████ | 7/96 [00:09<02:00, 1.36s/it]Step Loss: 1.215524
8%|████████ | 8/96 [00:10<02:00, 1.36s/it]Step Loss: 1.493567
9%|█████████ | 9/96 [00:12<01:52, 1.29s/it]Step Loss: 1.430892
10%|██████████ | 10/96 [00:13<01:46, 1.24s/it]Step Loss: 1.118710
11%|███████████ | 11/96 [00:14<01:48, 1.28s/it]Step Loss: nan
12%|████████████ | 12/96 [00:15<01:43, 1.23s/it]Step Loss: nan
14%|█████████████ | 13/96 [00:16<01:41, 1.22s/it]
@esdrascosta Same results occur time to time, and we are trying to find out the reason too. Maybe you can just have a cup of coffee and simply try it again at present.
Hi @plutoyuxie, Thanks for sharing your codes. I was also working on the implementation of RIAD. Your codes are great and helped me a lot. I really appreciate that.
Hi @esdrascosta , I met this problems before. I fixed it by modifying 'x = torch.sqrt(x + sys.float_info.epsilon)' at line 27 in gms_loss.py, then I've never met the NaN loss again. I think the problem is caused by 0 value when calculating the derivative. You can try this modification. I hope it helps.
BTW, have you ever tried to train ONE reconstruction model for multiple objects? I am trying this but the reconstruction results is not as good as single object.
Thanks, @MaDongao I will try it soon. Reconstructing multiple objects is much harder. As I know, the state-of-the-art method is called PaDiM, which is not a reconstruction method.
@plutoyuxie The following library might be helpful for your implementation. https://github.com/photosynthesis-team/piq