imapfilter icon indicating copy to clipboard operation
imapfilter copied to clipboard

Best way to decode utf8 headers

Open leipert opened this issue 7 years ago • 4 comments

First of all: Thank you for a great project!

I implemented a custom sorting mechanism which iterates over every message in a mailbox and uses :fetch_field to fetch header, which are then compared to rules. The problem is that some mails have utf8 encoded headers.

value = src:fetch_field(field):sub(start):lower()
print(value)

logs the following (imapfilter -v)

S (8): 100D OK SEARCH completed (Success)
C (8): 100E UID FETCH 11 BODY.PEEK[HEADER.FIELDS (Subject)]
S (8): 100E OK Success
Fetched field "Subject" of [email protected]@imap.gmail.com/temp_inbox[11].
=?utf-8?q?=5bslack=5d_notifications_from_the_company_workspace_for_march_=31=33th=2c_=32=30=31=38_at_=35=3a=32=36_pm?=

What would be the best way to retrieve [Slack] Notifications from the company workspace for March 13th, 2018 at 5:26 PM instead of the encoded string? Is there any helper function I could use, or could you expose one, if there isn't?

leipert avatar Mar 13 '18 16:03 leipert

Addition: I did not try options.charset = 'UTF-8', but some messages are ISO encoded and some are UTF-8 encoded.

Thank you very much!

leipert avatar Mar 13 '18 16:03 leipert

I wrote this helper function:

magicQ = "=?utf-8?q?="
magicQLength = string.len(magicQ)

function qdecode(value)
  if value == nil then
    return value
  end
  if string.sub(value, 1, magicQLength):lower()==magicQ then
    return value:sub(magicQLength)
              :gsub("_", " ")
              :gsub(
                "=([a-fA-F0-9][a-fA-F0-9])",
                function (codePoint)
                  return utf8.char(tonumber(codePoint, 16))
                end
              )
  end
  return value
end

If there isn't any better way, feel free to close this issue :)

leipert avatar Mar 16 '18 09:03 leipert

Your script didn't work for me as I encountered headers with embedded encoded sections, as well as "B" (base64) encodings, so I decided to modify it a bit:

function hdr_decode(s)
	local i, j = s:lower():find("=?utf-8?q?", 1, true);
	if i then
		local k, l = s:find("?=", j, true);
		local s_ = s
			:sub(j+1, k-1)
			:gsub("_", " ")
			:gsub("=([a-fA-F0-9][a-fA-F0-9])", function(c) return string.char(tonumber(c, 16)) end);
		return hdr_decode(s:sub(1, i-1) .. s_ .. s:sub(l+1))
	end

	i, j = s:lower():find("=?utf-8?b?", 1, true);
	if i then
		local k, l = s:find("?=", j, true);
		local s_ = s:sub(j+1, k-1):gsub("[%w%+/][%w%+/][%w%+/=][%w%+/=]",
			function(w)
				local digits = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
				local a = digits:find(w:sub(1, 1), 1, true);
				local b = digits:find(w:sub(2, 2), 1, true);
				local c = digits:find(w:sub(3, 3), 1, true);
				local d = digits:find(w:sub(4, 4), 1, true);
				return string.char(
					(a-1)*4 + math.floor((b-1)/16),
					(b-1)%16*16 + math.floor(((c or 1)-1)/4),
					((c or 1)-1)%4*64 + ((d or 1)-1)
				):sub(1, d and 3 or c and 2 or 1);
			end
		);
		return hdr_decode(s:sub(1, i-1) .. s_ .. s:sub(l+1));
	end

	return s;
end

print(hdr_decode("From: =?utf-8?b?SGVsbG8sIFdvcmxkIQ==?= <=?utf-8?q?hello=5Fworld=40example=2ecom?=>"));
-- > From: Hello, World! <[email protected]>
os.exit(0);

Note: The recursive approach is slow for larger strings, but it should work well enough for e-mail headers. Also, failure will not be graceful if the input is not well-formed. Finally, I'm using string.char instead of utf8.char because I'm currently locked to Lua 5.1 (get your act together gentoo grr...), but that should be easy to replace.

Feel free to use / improve further =)

SOwOphie avatar Jun 06 '20 21:06 SOwOphie

Thank you both for the helper. I used LadyBoonami code and it worked just fine. Thank you

hi-flyer avatar Jul 17 '21 20:07 hi-flyer