maigret icon indicating copy to clipboard operation
maigret copied to clipboard

Maigret does not save a PDF reports

Open Alexell opened this issue 3 years ago • 12 comments
trafficstars

Checklist

  • [v] I'm reporting a bug in Maigret functionality
  • [v] I've checked for similar bug reports including closed ones
  • [v] I've checked for pull requests that attempt to fix this bug

Description

Info about Maigret version you are running and environment (--version, operation system, ISP provider): maigret 0.4.3 Socid-extractor: 0.0.23 Aiohttp: 3.8.1 Requests: 2.27.1 Python: 3.8.10

How to reproduce this bug (commandline options / conditions):

  • Maigret searches the sites in the normal mode, without errors.
  • Maigret does not save a reports (pdf).
  • Only empty file report_username.pdf is created.
  • HTML reports saved normally

Errors at the end of execution (I don't know if they are the cause of the problem):

/usr/local/lib/python3.8/dist-packages/dateutil/parser/_parser.py:1207: UnknownTimezoneWar ning: tzname CDT identified but not understood. Pass tzinfos argument in order to corre ctly return a timezone-aware datetime. In a future version, this will raise an exception. warnings.warn("tzname {tzname} identified but not understood. " Traceback (most recent call last): File "/usr/local/bin/maigret", line 8, in sys.exit(run()) File "/usr/local/lib/python3.8/dist-packages/maigret/maigret.py", line 723, in run loop.run_until_complete(main()) File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/usr/local/lib/python3.8/dist-packages/maigret/maigret.py", line 701, in main save_pdf_report(filename, report_context) File "/usr/local/lib/python3.8/dist-packages/maigret/report.py", line 82, in save_pdf_re port pisa.pisaDocument(io.StringIO(filled_template), dest=f, default_css=css) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/document.py", line 104, in pisaDo cument context = pisaStory(src, path, link_callback, debug, default_css, xhtml, File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/document.py", line 67, in pisaSto ry pisaParser(src, context, default_css, xhtml, encoding, xml_output) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 761, in pisaPars er pisaLoop(document, context) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 699, in pisaLoop pisaLoop(node, context, path, **kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 643, in pisaLoop pisaLoop(nnode, context, path, **kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 643, in pisaLoop pisaLoop(nnode, context, path, **kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 643, in pisaLoop pisaLoop(nnode, context, path, **kw) [Previous line repeated 7 more times] File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 513, in pisaLoop attr = pisaGetAttributes(context, node.tagName, node.attributes) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/parser.py", line 125, in pisaGetA ttributes nv = c.getFile(nv) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/context.py", line 795, in getFile return getFile(name, relative or self.pathDirectory) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/util.py", line 762, in getFile file = pisaFileObject(*a, **kw) File "/usr/local/lib/python3.8/dist-packages/xhtml2pdf/util.py", line 665, in init conn.request("GET", path) File "/usr/lib/python3.8/http/client.py", line 1256, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib/python3.8/http/client.py", line 1011, in _send_output self.send(msg) File "/usr/lib/python3.8/http/client.py", line 951, in send self.connect() File "/usr/lib/python3.8/http/client.py", line 1425, in connect self.sock = self._context.wrap_socket(self.sock, File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket return self.sslsocket_class._create( File "/usr/lib/python3.8/ssl.py", line 1040, in _create self.do_handshake() File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [SSL: WRONG_SIGNATURE_TYPE] wrong signature type (_ssl.c:1131)

Alexell avatar Apr 24 '22 13:04 Alexell

Hey, please, specify the username you've searched for.

soxoj avatar Apr 24 '22 14:04 soxoj

@soxoj My username as you see it.

Alexell avatar Apr 24 '22 18:04 Alexell

I am unable to reproduce a crash of report creation for now, only unknown timezone warning:

[-] Generating report info...
/usr/local/lib/python3.9/site-packages/dateutil/parser/_parser.py:1207: UnknownTimezoneWarning: tzname CDT identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  warnings.warn("tzname {tzname} identified but not understood.  "
[-] HTML report on all usernames saved in /tmp/report_Alexell_plain.html

Could you attach list of you packages with versions got with pip3 freeze > pkgs.txt? I'll try to reproduce your full environment.

soxoj avatar Apr 24 '22 19:04 soxoj

In your screenshot, i see message about html report. My html report is saved normally with this username. A PDF report is not saved with the same username (maigret alexell --pdf command). I am attaching what you asked for. pkgs.txt

Alexell avatar Apr 25 '22 04:04 Alexell

I guess the problem is caused by http error while xhtml2pdf trying to download and render some profile image by URL. But I still counldn't reproduce it :( Let's try to localize the site. Is the following command fails with crash? maigret alexell --pdf --retries 0 --top-sites 100 --no-recursion If yes, please send the console output.

soxoj avatar Apr 28 '22 22:04 soxoj

The PDF report for this command is generated normally. But the program execution time was short and the report turned out to be much shorter than the html-report was after running without additional arguments. Apparently, there is still some kind of problem site, but the program does not reach it in the last launch option.

Alexell avatar Apr 29 '22 05:04 Alexell

Well, so let's try different modes :)

  1. maigret alexell --pdf --retries 0 -a --no-recursion
  2. maigret alexellpro --pdf --retries 0 -a --no-recursion

soxoj avatar Apr 29 '22 15:04 soxoj

  1. After execution, we have the same errors that were at the very beginning and a report with a size of 0 bytes.
  2. The report was saved normally.

Alexell avatar Apr 30 '22 08:04 Alexell

Okay, let's increase count of sites step-by-step, e.g.: maigret alexell --pdf --retries 0 -a --no-recursion --top-sites 200 Please, attach the text file with a full console output after reproducing the error.

soxoj avatar Apr 30 '22 08:04 soxoj

console_log.txt

Alexell avatar Apr 30 '22 10:04 Alexell

Thanks, got it, let's check the following sites: maigret alexell --site Flickr --site Pastebin --site BuzzFeed --site Tinder --site MixCloud --site BitBucket --site last.fm --site Gravatar --site uID.me --site Paypal --site Kik --pdf

soxoj avatar Apr 30 '22 10:04 soxoj

With these sites, the report was saved normally.

Alexell avatar Apr 30 '22 10:04 Alexell