facenet icon indicating copy to clipboard operation
facenet copied to clipboard

RuntimeWarning: invalid value encountered in less

Open cw-plus opened this issue 7 years ago • 7 comments

Hi, when I train facene ,some errors occur:

(tf17) wc@ubuntu:~/FaceNet/facenet$ python src/train_tripletloss.py --logs_base_dir ~/logs/facenet/ --models_base_dir ~/models/facenet/ --data_dir /home/wc/FaceNet/datasets/post_process --image_size 160 --model_def models.inception_resnet_v1 --lfw_dir /home/wc/FaceNet/datasets/lfw/lfw_mtcnnpy_160 --optimizer RMSPROP --learning_rate 0.01 --weight_decay 1e-4 --max_nrof_epochs 500 Model directory: /home/wc/models/facenet/20180506-225806 Log directory: /home/wc/logs/facenet/20180506-225806 LFW directory: /home/wc/FaceNet/datasets/lfw/lfw_mtcnnpy_160 2018-05-06 22:59:34.151359: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA Running forward pass on sampled images: 141.935 Selecting suitable triplets for training src/train_tripletloss.py:295: RuntimeWarning: invalid value encountered in less all_neg = np.where(np.logical_and((neg_dists_sqr-pos_dist_sqr)<alpha, pos_dist_sqr<neg_dists_sqr))[0] # FaceNet selection (nrof_random_negs, nrof_triplets) = (18017, 17986): time=143.185 seconds

Please help me. Thanks.

cw-plus avatar May 07 '18 06:05 cw-plus

Did anyone figure out the reason @ChaoWangHS ? I am getting the same error too and it seems to be happening for specific batch sizes.

abhisheksgumadi avatar May 11 '18 00:05 abhisheksgumadi

have u resolved it ? i am getting the same error too

StonePanda avatar Mar 17 '20 17:03 StonePanda

hello, i have fond that the warning still exists when i successfully trained, so just ignore it. I just find the reason for "killed" is OOM, my process was killed by CentOS. so Increasing virtual memory is useful for me.

StonePanda avatar Mar 23 '20 09:03 StonePanda

Hello, have you solved this problem successfully?After I have successfully trained for one epoch, the parameter erof_triplets always remain 0. The problem is shown below. I used CASIA dataset for training. image And I checked the parameter embeddings in the function select_triplets, I found when I ran the second epoch, embeddings are filled with nan. However, when I used a smaller dataset for training, it ran nomally. Could you tell me how to solve this problem? I will appreciate it if you can give me any advise!

Richard-wang85 avatar May 07 '22 00:05 Richard-wang85

    您好,您的邮件已收到。

StonePanda avatar May 07 '22 00:05 StonePanda

Hi @Richard-wang85 , I encounter the same problem. Did you find any solution to that?

Thanks Lukas

Lasklu avatar Jan 07 '23 21:01 Lasklu

    您好,您的邮件已收到。

StonePanda avatar Jan 07 '23 21:01 StonePanda