ruby-msg
ruby-msg copied to clipboard
Incompatible character encodings with ruby 2.1.2
Sorry to bother again but after some testing I noticed another issue.
I have my email file in variable 'msg'. Performing the following command shows the expected output (same as Ruby 1.8.7) msg.to_mime => #< Mime content_type="multipart/alternative" >
However, when I call to_s, I receive an error with Ruby 2.1.2
msg.to_mime.to_s
Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT
from /Users/user/.rbenv/versions/2.1.2/gemsets/workers/gems/ruby-msg 1.5.2/lib/mapi/mime.rb:109:in join' from /Users/user/.rbenv/versions/2.1.2/gemsets/workers/gems/ruby-msg 1.5.2/lib/mapi/mime.rb:109:in
to_s'
When researching the issue, I replaced the following in lib/mapi/mime.rb#108: part.to_s(opts) with part.to_s(opts).encode("UTF-8", :invalid=>:replace, :undef => :replace, :replace => "")
After I made that change to add the .encode in mime.rb, I was able to get my emails to convert correctly.
I also see that the 'parts' of my email are different encodings which I believe is what the error is referring to. irb(main):003:0> msg.to_mime.parts.each do |part| irb(main):004:1* puts part.to_s.encoding irb(main):005:1> end; nil UTF-8 ASCII-8BIT
Thus, by removing the above change to mime.rb and adding the .encode method in lib/mapi/convert/note_mime.rb#159 on the "props.body_html" also allows the email to convert correctly (that is the 'part' that is being encoded as ASCII-8BIT).
Not sure of the exact way to fix this permanently. If there is any other information I can provide I will be glad to do so.
Thanks for your help!
Yeah a related (or the same?) problem has been mentioned here - https://github.com/aquasync/ruby-msg/pull/5. As mentioned there, I think the mime parts should be treated as binary data (ie encoded with ASCII-8BIT), not encoded strings, as they describe their own text encoding through Content-type. Indeed a single message could have multiple parts with different encodings. I think the fix is to avoid the UTF-8 parts being introduced, which I think is by way of strings in the source code being implicitly UTF-8. I image adding the magic "# encoding: ASCII-8BIT" constants to the offending source files would fix the issue.
I'm having this problem. Any recommended work around?