net-imap icon indicating copy to clipboard operation
net-imap copied to clipboard

💥⚡ Simplify `header-fld-name` parser (backward incompatible)

Open nevans opened this issue 1 year ago • 0 comments

This speeds up my (newly added) benchmarks by ~15-20% over v0.4.3, and by 22-40% over earlier versions.

NOTE: In every version up to v0.4.3, Net::IMAP recreated the raw original source string. After #217, it slices the raw original source string. After this PR, it returns the decoded astring value. Although this is technically incompatible, it should almost never make a difference. See the IANA Message Header Field Names list. All standard header field names are valid IMAP atoms. So I think this incompatibility should almost never occur.

Valid RFC-5322 field names will never require string literals. But they technically may include atom-special characters and thus need to be quoted. Note that RFC-6532 (I18N headers) explicitly does not change the RFC-5322 field name syntax.

RFC-5322 syntax:

field-name      =   1*ftext
ftext           =   %d33-57 /          ; Printable US-ASCII
                    %d59-126           ;  characters not including
                                       ;  ":".

Which is matched by the following Regexp:

33.chr  => "!"
57.chr  => "9"
59.chr  => ";"
126.chr => "~"
[*33..57, *59..126].map{_1.chr}.join =>
  "!\"\#$%&'()*+,-./0123456789;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
VALID_RFC5322_FIELD_NAME = /\A[!-9;-~]+\z/

Although it shouldn't, if a server unnecessarily uses a quoted string (or a literal) for any standard message headers, this PR simplifies accessing the result by normalizing the field name back to its atom form.

The real incompatibility occurs when fetching non-standard but syntactically valid RFC-5322 field names, containing atom specials, which need to be quoted. But the workaround is simple.

For example, with the following non-standard (but syntactically valid) field names:

field_names = %w[
  \\Foo%
  ("BAR")
]
field_names.all?(VALID_RFC5322_FIELD_NAME) => true

The current version of Net::IMAP#fetch doesn't quote any attrs, so in order to fetch these we'd need to manually quote them:

quoted_names = field_names
  .map { _1.gsub(/["\\]/) { "\\#$&" } }
  .join(" ")
joined_names = field_names.join(" ")

joined_names => "\\Foo% (\"BAR\")"
quoted_names => "\"\\\\Foo%\" \"(\\\"BAR\\\")\""

imap.fetch(1, "BODY[HEADER.FIELDS (#{quoted_names})]") => [fetch_data]

All of the above is unchanged by this PR. The incompatibility is when retrieving the results from the FetchData:

# In the current version (v0.4.3), access using quoted names:
fetch_data.attr['BODY[HEADER.FIELDS (#{quoted_names})'] => String

# After this PR, access using unquoted names:
fetch_data.attr['BODY[HEADER.FIELDS (#{joined_names})'] => String

However, I also prepared a version that is backward-compatible, with a smaller performance boost:

  • #217

nevans avatar Nov 03 '23 03:11 nevans