email2pdf Adding basic email headers breaks HTML

Adding basic email headers breaks HTML

Open broth-itk opened this issue 4 years ago • 3 comments

Payload has valid HTML code. The headers will be added in front of the <html> start tag which breaks HTML standard:

if args.headers:
            header_info = get_formatted_header_info(input_email)
            logger.info("Header info is: " + header_info)
            payload = header_info + payload

Sep 05 '20 22:09 broth-itk

Proposal:

if args.headers:
            header_info = get_formatted_header_info(input_email)
            logger.info("Header info is: " + header_info)
            soup = BeautifulSoup(payload, "html.parser")
            soup.body.insert(1, BeautifulSoup(header_info, 'html.parser'))
            payload = str(soup)

Sep 05 '20 22:09 broth-itk

@broth-itk I think I get the general intent here, but what kind of problem are you trying to solve specifically? In practice, this generally speaking seems to work for me. I'm cautious about running the email body through the BS parser unless there's a compelling reason to do so.

Sep 06 '20 08:09 andrewferrier

@andrewferrier: Thanks for your feedback! I just wanted to point out that the HTML code will be invalidated when adding headers. wkhtmltopdf does seem to handle that issue fine but IMHO we should feed it with proper HTML code. In my case I modified the script to add even another header to the resulting HTML code. This might get me into troubles.

I am not a fan sending all though BS parser as well. Maybe it's simpler to search the string for <body> tag and insert the code right there. Will have the same effect IMHO.

Just my 2 cents

Sep 06 '20 09:09 broth-itk

email2pdf email2pdf copied to clipboard

Adding basic email headers breaks HTML

email2pdf
email2pdf copied to clipboard