imapfilter
imapfilter copied to clipboard
Best way to decode utf8 headers
First of all: Thank you for a great project!
I implemented a custom sorting mechanism which iterates over every message in a mailbox and uses :fetch_field to fetch header, which are then compared to rules. The problem is that some mails have utf8 encoded headers.
value = src:fetch_field(field):sub(start):lower()
print(value)
logs the following (imapfilter -v)
S (8): 100D OK SEARCH completed (Success)
C (8): 100E UID FETCH 11 BODY.PEEK[HEADER.FIELDS (Subject)]
S (8): 100E OK Success
Fetched field "Subject" of [email protected]@imap.gmail.com/temp_inbox[11].
=?utf-8?q?=5bslack=5d_notifications_from_the_company_workspace_for_march_=31=33th=2c_=32=30=31=38_at_=35=3a=32=36_pm?=
What would be the best way to retrieve [Slack] Notifications from the company workspace for March 13th, 2018 at 5:26 PM instead of the encoded string? Is there any helper function I could use, or could you expose one, if there isn't?
Addition: I did not try options.charset = 'UTF-8', but some messages are ISO encoded and some are UTF-8 encoded.
Thank you very much!
I wrote this helper function:
magicQ = "=?utf-8?q?="
magicQLength = string.len(magicQ)
function qdecode(value)
if value == nil then
return value
end
if string.sub(value, 1, magicQLength):lower()==magicQ then
return value:sub(magicQLength)
:gsub("_", " ")
:gsub(
"=([a-fA-F0-9][a-fA-F0-9])",
function (codePoint)
return utf8.char(tonumber(codePoint, 16))
end
)
end
return value
end
If there isn't any better way, feel free to close this issue :)
Your script didn't work for me as I encountered headers with embedded encoded sections, as well as "B" (base64) encodings, so I decided to modify it a bit:
function hdr_decode(s)
local i, j = s:lower():find("=?utf-8?q?", 1, true);
if i then
local k, l = s:find("?=", j, true);
local s_ = s
:sub(j+1, k-1)
:gsub("_", " ")
:gsub("=([a-fA-F0-9][a-fA-F0-9])", function(c) return string.char(tonumber(c, 16)) end);
return hdr_decode(s:sub(1, i-1) .. s_ .. s:sub(l+1))
end
i, j = s:lower():find("=?utf-8?b?", 1, true);
if i then
local k, l = s:find("?=", j, true);
local s_ = s:sub(j+1, k-1):gsub("[%w%+/][%w%+/][%w%+/=][%w%+/=]",
function(w)
local digits = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
local a = digits:find(w:sub(1, 1), 1, true);
local b = digits:find(w:sub(2, 2), 1, true);
local c = digits:find(w:sub(3, 3), 1, true);
local d = digits:find(w:sub(4, 4), 1, true);
return string.char(
(a-1)*4 + math.floor((b-1)/16),
(b-1)%16*16 + math.floor(((c or 1)-1)/4),
((c or 1)-1)%4*64 + ((d or 1)-1)
):sub(1, d and 3 or c and 2 or 1);
end
);
return hdr_decode(s:sub(1, i-1) .. s_ .. s:sub(l+1));
end
return s;
end
print(hdr_decode("From: =?utf-8?b?SGVsbG8sIFdvcmxkIQ==?= <=?utf-8?q?hello=5Fworld=40example=2ecom?=>"));
-- > From: Hello, World! <[email protected]>
os.exit(0);
Note: The recursive approach is slow for larger strings, but it should work well enough for e-mail headers. Also, failure will not be graceful if the input is not well-formed. Finally, I'm using string.char instead of utf8.char because I'm currently locked to Lua 5.1 (get your act together gentoo grr...), but that should be easy to replace.
Feel free to use / improve further =)
Thank you both for the helper. I used LadyBoonami code and it worked just fine. Thank you