img2dataset icon indicating copy to clipboard operation
img2dataset copied to clipboard

add option to ignore ssl certificate

Open rom1504 opened this issue 3 years ago • 7 comments

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

https://stackoverflow.com/a/58337431

can be important if urls are not guaranteed to have a valid ssl certificate

rom1504 avatar Jan 29 '22 19:01 rom1504

Interesting, I came across that issue as well. You see any disadvantage in using it all the time?

borisdayma avatar Sep 08 '22 14:09 borisdayma

Yes, if you download from an https url that fails the certificate check then it may mean nodes (routers, proxy,...) on your network path may be sending you incorrect data. So it can be a valid choice to refuse images that failed that check.

However i do think the option is useful as it happens that many websites simply have invalid certificates.

rom1504 avatar Sep 08 '22 19:09 rom1504

Yeah, I noticed it as well. I guess if the main risk is that you download an incorrect image you'll find it out when you try to decode/resize it.

borisdayma avatar Sep 08 '22 21:09 borisdayma

Is this a change you made in the source code? I still have a lot of ssl bug in my machine .

"../cc3m_val2/00001_stats.json" [noeol] 40L, 2094C 1,1 All { "count": 5840, "successes": 292, "failed_to_download": 5517, "failed_to_resize": 31, "duration": 618.1924488544464, "start_time": 1687167190.7557843, "end_time": 1687167808.9482331, "status_dict": { "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>": 3332, "HTTP Error 400: Bad Request": 12, "<urlopen error [Errno -2] Name or service not known>": 124, "success": 292, "HTTP Error 403: Forbidden": 38, "<urlopen error [Errno -5] No address associated with hostname>": 19, "HTTP Error 404: Not Found": 69, "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)>": 51, "<urlopen error _ssl.c:1114: The handshake operation timed out>": 61, "": 1719, "timed out": 17, "HTTP Error 403: Access Forbidden": 6, "<urlopen error [Errno 111] Connection refused>": 15, "<urlopen error [Errno -3] Temporary failure in name resolution>": 28, "HTTP Error 404: Site Not Found": 1, "Use of image disallowed by X-Robots-Tag directive": 1, "<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (ssl.c:1131)>": 2, "HTTP Error 503: Service Unavailable": 2, "HTTP Error 500: Domain Not Found": 2, "Image decoding error": 27, "<urlopen error [Errno 104] Connection reset by peer>": 2, "[Errno 104] Connection reset by peer": 6, "HTTP Error 308: Permanent Redirect": 1, "HTTP Error 404: Object Not Found": 1, "OpenCV(4.7.0) /io/opencv/modules/imgcodecs/src/loadsave.cpp:798: error: (-215:Assertion failed) !buf.empty() in function 'imdecode'\n": 4, "HTTP Error 502: Bad Gateway": 2, "Remote end closed connection without response": 1, "<urlopen error [Errno 113] No route to host>": 3, "HTTP Error 410: Gone": 2 } } ~

shuguang99 avatar Jun 19 '23 20:06 shuguang99

Same here. Did you add the option?

theophilegervet avatar Jan 28 '24 07:01 theophilegervet

No the issue is still open. You can open a PR if you need it

rom1504 avatar Jan 28 '24 09:01 rom1504

Sounds good, here is a PR: https://github.com/rom1504/img2dataset/pull/397

Great work on this library btw!

theophilegervet avatar Jan 28 '24 16:01 theophilegervet