Discord-Scraper icon indicating copy to clipboard operation
Discord-Scraper copied to clipboard

ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)

Open judicalp opened this issue 4 years ago • 12 comments

Hello Thanks for the amazing work I configure everything but when i run I get the message

raceback (most recent call last): File "/Users/Judicael/PycharmProjects/DiscordScrapper/discord.py", line 347, in ds.grab_server_data() File "/Users/Judicael/PycharmProjects/DiscordScrapper/discord.py", line 323, in grab_server_data self.get_server_name(server), File "/Users/Judicael/PycharmProjects/DiscordScrapper/discord.py", line 164, in get_server_name server = request.grab_page('https://discordapp.com/api/%s/guilds/%s' % (self.api, serverid)) File "/Users/Judicael/PycharmProjects/DiscordScrapper/SimpleRequests/SimpleRequestsPy3.py", line 187, in grab_page resp = self.get_response(url) File "/Users/Judicael/PycharmProjects/DiscordScrapper/SimpleRequests/SimpleRequestsPy3.py", line 99, in get_response conn.request('GET', urlsplices.path, headers=self.headers) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1255, in request self._send_request(method, url, body, headers, encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1301, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1250, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1010, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 950, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1424, in connect self.sock = self._context.wrap_socket(self.sock, File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)

Process finished with exit code 1

Any idea what's wrong ? Thank you

judicalp avatar Dec 30 '20 15:12 judicalp

It appears you're using the main branch, have you tested this out with the experimental branch?

The reason why I make reference to the experimental branch is that it's a complete rewrite of the code and as such it might be more stable than the main branch which hasn't seen a major update in over a year (at least with the SimpleRequests module files).

Let me know if this same issue occurs in the experimental branch, I'll be figuring out what is going on with SSL in the main branch for Python 3 since I don't think this same issue occurs with the Python 2 code (urllib2 vs http.client).

Dracovian avatar Dec 30 '20 16:12 Dracovian

Thank you, that worked perfectly. but now I am confused If I run discord.py it executes with no end (have to interrupt it after a few minutes) - looping day by day in the past returning no error If I run any other script I just can a response 0 immediately

What should I expect as a result ? Thanks

judicalp avatar Dec 30 '20 19:12 judicalp

I can see the script goes to exception in the try: statement of function startGuild - but cannot figure out why or what is the error message ?

judicalp avatar Dec 30 '20 19:12 judicalp

Ok now if I print the scraper.grabGuildName(guild) in the exception I get exactly the same problem as before

Traceback (most recent call last): File "/Users/Judicael/PycharmProjects/DiscordScrapper2/discord.py", line 53, in startGuild scraper.grabGuildName(guild) File "/Users/Judicael/PycharmProjects/DiscordScrapper2/module/DiscordScraper.py", line 208, in grabGuildName response = request.sendRequest(url) File "/Users/Judicael/PycharmProjects/DiscordScrapper2/module/RequestB.py", line 78, in sendRequest connection.request('GET', urlpath, headers=self.headers) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1255, in request self._send_request(method, url, body, headers, encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1301, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1250, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1010, in _send_output self.send(msg) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 950, in send self.connect() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/http/client.py", line 1424, in connect self.sock = self._context.wrap_socket(self.sock, File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1040, in _create self.do_handshake() File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)

judicalp avatar Dec 30 '20 19:12 judicalp

Ok now I solved it by running "Install Certificates.command" on my mac

Now the sript runs; create folders for the scrap; but remains empty

judicalp avatar Dec 30 '20 20:12 judicalp

Right, I'm going to have to start testing this script on my installation of Big Sur. The other thing is the inability to interrupt the runtime of the script; I have attempted to add checks for keypresses such as the "CTRL+C" one that is probably "Command+C" on macOS to no real avail.

I'll keep looking into that since it's a pain to have to kill the process in a separate terminal or through some task manager (or the force quit option in macOS).

Now as far as the folders remaining empty, this is heavily dependent on how many days it has been since anything was uploaded to the channel(s). The script takes the current date and year and then goes backwards until around the beginning of 2015 (Discord went public around May 2015 but some developer guilds probably have content spanning back before then if they're still publicly accessible).

So what this means is that if the latest image/video that was added to a particular channel happens to be from January of this year, then you have a long time to wait for the script to find that file. The thing is that it checks all the given channels day-by-day so if you have 10 channels then it will take 10 requests to go through each day (this means that even if the content was uploaded 10 days ago then that would require about 100 requests in total before the script gets around to accessing that data).

I have thought about adding more data to the config file for setting start dates on each individual channel to alleviate this, but I'm still holding out for a cleaner solution (maybe something involving JSON caching which is likely to yield smaller files and then sifting through the JSON data that was downloaded to find the embedded content).

Either way you've probably seen some interesting settings added to the config of the experimental branch; and I have delved into further detail on all of this in the repo's wiki:

Information on the config file for the experimental branch

A complete function reference for the experimental branch

A partial (but relevant) programmer's reference for message scraping on Discord

If you're just wanting to test out a single channel and set the start day to something that isn't today then I did include an optional function argument that takes a datetime object (used for recursion in the script but can be accessed manually if desired) here.

Dracovian avatar Dec 30 '20 20:12 Dracovian

That is great thank you so much for your detailled answer. The discord channel has very images and videos but text everyday (several times per day). Would the script scrap the text as well, right ? Or did I miss something ? Thanks

judicalp avatar Dec 30 '20 20:12 judicalp

So far I haven't implemented a JSON caching method and my previous attempts at scraping text yielded inconsistent results (some days got skipped for some reason and duplicates sometimes occurred since Discord's search feature does return four posts surrounding the one that was searched to give some context to the post's content (I assume)).

I do plan on figuring out a way to get the script to respond to keyboard shortcuts and I also plan on having the script archive JSON contents as soon as I figure out how to efficiently deal with the duplicate post issue (without having to append them to a single massive JSON file as I thought about doing a couple of years ago).

On top of that I do have to work on the features of the extra functions in the config file and once I do all of that then I'll merge the experimental branch code and it will become the new main branch code.

Now as far as some of the added options in the experimental branch config file; I do have some prior experience with implementing some of those features in previous scripts I have written:

validateFileHeader: Something like this but with far more values to check, I'll also make sure that the script warns about excess data at the expected end of the file (potentially data hidden at the end of an image that won't show up in your standard image viewer/editor program). https://imgur.com/ufPJZKx

And I'll rename the gatherJSONData to clearJSONData since the JSON caching is going to become an integral portion of the script for gathering embedded contents and text data. If it's set to "true" then that'll just tell the script to delete any JSON data that was grabbed upon exiting (meaning that I do have to set up that interrupt keypress combination in some capacity).

Dracovian avatar Dec 30 '20 20:12 Dracovian

Okay, I decided to use a documented API function to retrieve the latest post in a channel and tested it out on a dead channel that hasn't seen a post since May 30, 2019: https://imgur.com/nkUI9zY

This means that the startup will take a bit longer but it should increase the runtime performance overall when dealing with multiple channels and is more elegant than having to do it all through the configuration file.


And this is how it looks on the latest public release of macOS Big Sur: https://imgur.com/H2lbEe6

Dracovian avatar Dec 30 '20 22:12 Dracovian

Awesome ! you are the best ! How can I use this ?

judicalp avatar Dec 30 '20 22:12 judicalp

So far I haven't finished implementing it just yet, but I do plan on pushing it alongside several other additional features in the next update to the experimental branch.

If I get through with the things I really want to do, then that's when I'll merge the whole thing into the main branch and close a ton of open issues with a comment on each of them asking for new issues to be opened on the new main branch code to prevent any future confusion (and because I hope that these updates will end up fixing said issues as well).

Dracovian avatar Dec 30 '20 22:12 Dracovian

Okay I have made it so that SIGINT signals are caught during runtime but the script continues to execute despite telling the script to exit upon SIGINT: https://imgur.com/nXEVKAZ

This screenshot also showcases the added security on the experimental branch code compared with the main branch code. That's one big plus because beforehand there could have been major issues with token headers being sent to sites that weren't owned by Discord (this is likely why Discord hates it whenever people use their own authorization token for things like this).

Dracovian avatar Dec 30 '20 23:12 Dracovian