add option to ignore ssl certificate
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
https://stackoverflow.com/a/58337431
can be important if urls are not guaranteed to have a valid ssl certificate
Interesting, I came across that issue as well. You see any disadvantage in using it all the time?
Yes, if you download from an https url that fails the certificate check then it may mean nodes (routers, proxy,...) on your network path may be sending you incorrect data. So it can be a valid choice to refuse images that failed that check.
However i do think the option is useful as it happens that many websites simply have invalid certificates.
Yeah, I noticed it as well. I guess if the main risk is that you download an incorrect image you'll find it out when you try to decode/resize it.
Is this a change you made in the source code? I still have a lot of ssl bug in my machine .
"../cc3m_val2/00001_stats.json" [noeol] 40L, 2094C 1,1 All
{
"count": 5840,
"successes": 292,
"failed_to_download": 5517,
"failed_to_resize": 31,
"duration": 618.1924488544464,
"start_time": 1687167190.7557843,
"end_time": 1687167808.9482331,
"status_dict": {
"<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1131)>": 3332,
"HTTP Error 400: Bad Request": 12,
"<urlopen error [Errno -2] Name or service not known>": 124,
"success": 292,
"HTTP Error 403: Forbidden": 38,
"<urlopen error [Errno -5] No address associated with hostname>": 19,
"HTTP Error 404: Not Found": 69,
"<urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)>": 51,
"<urlopen error _ssl.c:1114: The handshake operation timed out>": 61,
"
Same here. Did you add the option?
No the issue is still open. You can open a PR if you need it
Sounds good, here is a PR: https://github.com/rom1504/img2dataset/pull/397
Great work on this library btw!