net-imap
net-imap copied to clipboard
💥⚡ Simplify `header-fld-name` parser (backward incompatible)
This speeds up my (newly added) benchmarks by ~15-20% over v0.4.3, and by 22-40% over earlier versions.
NOTE: In every version up to v0.4.3, Net::IMAP
recreated the raw original source string. After #217, it slices the raw original source string. After this PR, it returns the decoded astring value. Although this is technically incompatible, it should almost never make a difference. See the IANA Message Header Field Names list. All standard header field names are valid IMAP atoms. So I think this incompatibility should almost never occur.
Valid RFC-5322 field names will never require string literals. But they technically may include atom-special characters and thus need to be quoted. Note that RFC-6532 (I18N headers) explicitly does not change the RFC-5322 field name syntax.
RFC-5322 syntax:
field-name = 1*ftext
ftext = %d33-57 / ; Printable US-ASCII
%d59-126 ; characters not including
; ":".
Which is matched by the following Regexp:
33.chr => "!"
57.chr => "9"
59.chr => ";"
126.chr => "~"
[*33..57, *59..126].map{_1.chr}.join =>
"!\"\#$%&'()*+,-./0123456789;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
VALID_RFC5322_FIELD_NAME = /\A[!-9;-~]+\z/
Although it shouldn't, if a server unnecessarily uses a quoted string (or a literal) for any standard message headers, this PR simplifies accessing the result by normalizing the field name back to its atom form.
The real incompatibility occurs when fetching non-standard but syntactically valid RFC-5322 field names, containing atom specials, which need to be quoted. But the workaround is simple.
For example, with the following non-standard (but syntactically valid) field names:
field_names = %w[
\\Foo%
("BAR")
]
field_names.all?(VALID_RFC5322_FIELD_NAME) => true
The current version of Net::IMAP#fetch
doesn't quote any attrs, so in order to fetch these we'd need to manually quote them:
quoted_names = field_names
.map { _1.gsub(/["\\]/) { "\\#$&" } }
.join(" ")
joined_names = field_names.join(" ")
joined_names => "\\Foo% (\"BAR\")"
quoted_names => "\"\\\\Foo%\" \"(\\\"BAR\\\")\""
imap.fetch(1, "BODY[HEADER.FIELDS (#{quoted_names})]") => [fetch_data]
All of the above is unchanged by this PR. The incompatibility is when retrieving the results from the FetchData
:
# In the current version (v0.4.3), access using quoted names:
fetch_data.attr['BODY[HEADER.FIELDS (#{quoted_names})'] => String
# After this PR, access using unquoted names:
fetch_data.attr['BODY[HEADER.FIELDS (#{joined_names})'] => String
However, I also prepared a version that is backward-compatible, with a smaller performance boost:
- #217