email2pdf
email2pdf copied to clipboard
Adding basic email headers breaks HTML
Payload has valid HTML code.
The headers will be added in front of the <html>
start tag which breaks HTML standard:
if args.headers:
header_info = get_formatted_header_info(input_email)
logger.info("Header info is: " + header_info)
payload = header_info + payload
Proposal:
if args.headers:
header_info = get_formatted_header_info(input_email)
logger.info("Header info is: " + header_info)
soup = BeautifulSoup(payload, "html.parser")
soup.body.insert(1, BeautifulSoup(header_info, 'html.parser'))
payload = str(soup)
@broth-itk I think I get the general intent here, but what kind of problem are you trying to solve specifically? In practice, this generally speaking seems to work for me. I'm cautious about running the email body through the BS parser unless there's a compelling reason to do so.
@andrewferrier: Thanks for your feedback! I just wanted to point out that the HTML code will be invalidated when adding headers. wkhtmltopdf does seem to handle that issue fine but IMHO we should feed it with proper HTML code. In my case I modified the script to add even another header to the resulting HTML code. This might get me into troubles.
I am not a fan sending all though BS parser as well. Maybe it's simpler to search the string for <body>
tag and insert the code right there. Will have the same effect IMHO.
Just my 2 cents