vpim icon indicating copy to clipboard operation
vpim copied to clipboard

Invalid Encoding Error (Unescaped Line Break)

Open nmichels opened this issue 5 years ago • 2 comments

Hi @sam-github First I would like to thank you for the effort in this gem, really helpful! So here is the problem I'm facing: I was testing with my customer an importing contacts feature using vCard files, so I've decided to use vpim gem to read files and then perform my business logic over the decoded file. The client reported a problem while trying to import a file, so I found out that the generated file has a line like this:

item2.TEL:777-888-9999
item2.X-ABLabel:Other
VOICE

With an unescaped line break. I've also noticed that between Other and VOICE there is just a LF character. According to the RFC 6350 https://tools.ietf.org/html/rfc6350 the delimiting character between lines should be CRLF, and I was able to overcome this situation with the following monkeypatch:

module Vpim
  #enforce CRLF line break according to RFC 6350 Session 3.2 https://tools.ietf.org/html/rfc6350
  def Vpim.unfold(card) # :nodoc:
    unfolded = []
    card.each_line("\r\n") do |line|
      line.chomp!
      # If it's a continuation line, add it to the last.
      # If it's an empty line, drop it from the input.
      if( line =~ /^[ \t]/ )
        unfolded[-1] << line[1, line.size-1]
      elsif (unfolded.last && unfolded.last =~ /;ENCODING=QUOTED-PRINTABLE:.*?=$/)
        unfolded.last << line
      elsif( line =~ /^$/ )
      else
        unfolded << line
      end
    end
    unfolded
  end
end

But I don't know how safe is to use this solution, do you have any ideas on this? Thanks a lot!

nmichels avatar Mar 07 '19 15:03 nmichels

https://tools.ietf.org/html/rfc6350#section-3.4

NEWLINE (U+000A) characters in values MUST be encoded by two characters: a BACKSLASH followed by either an 'n' (U+006E) or an 'N' (U+004E).

So input is either invalid, or possibly its not 6350 encoded, VCF predates the IETF standardization, and the earlier formats were looser. Also, it technically just describes the format passed via MIME (HTTP, email, etc.), but what people often work with is files saved to disk, and when files with CRLF endings are saved to disk, they usually get converted to the local system's line ending convention.

vpim tries to be useful (not just correct), so it does its best to support the various flavours of vcard, even slightly invalid ones, but I'm not sure what heuristic it could use to detect what you are seeing. I guess if it saw a param: value line that had no :, it could just decide to merge it with the previous line? Seems dicy. You could do this as a pre-processing step, though, before feeding the data to vpim.

sam-github avatar Mar 07 '19 16:03 sam-github

Thanks for the quick and clear response! Initially I didn't consider preprocessing the file due to the performance impact of reading the file twice, but I agree that this is a less "agressive" approach than monkey patching the gem.

nmichels avatar Mar 07 '19 17:03 nmichels