xmpp4r
xmpp4r copied to clipboard
xmpp4r throws the REXML::ParseException on Cyrillic characters
When simplest Jabber client (like at http://dumpz.org/9810/) receives the UTF-8 string which contains Cyrillic (e.g. Russian) characters, xmpp4r fails with REXML::ParseException and won't work anymore.
Backtrace here: http://dumpz.org/9806/
Thanks.
The dumpz links don't work anymore
erm, could you try to write a simple testcase for that bug, based on one of the existing testcases? According to the exception, REXML receives ASCII 8bit characters, not UTF-8 characters.
Same situation. http://dumpz.org/17011/ ruby 1.9.1p378 (2010-01-10 revision 26273) [x86_64-linux] xmpp4r-0.5 Tested for [email protected]
Yes. I would need a test case.
http://megabytov.net/293 - test case.
that could be a rexml problem. Since current version of ejabberd opens an xml stream with xml header missing encoding attribute, rexml fails to retrieve encoding and baseparser sets source.encoding to nil, which falls back to ASCII8BIT. That's wrong behaviour according to http://www.opentag.com/xfaq_enc.htm and I'm also unsure about force_utf8 detection algorithm. An immediate fix for me was to patch rexml's:
- source.rb: http://pastie.org/1458174
- parsers/baseparser.rb: http://pastie.org/1454110
Although nothing related seemed to break for now, I only used it in the development environment, not production. Regards
I'm having this same issue. Is there any support for swapping out the xml parsing backend for something like nokogiri?
I don't know. The last thing I came up with was
if RUBY_VERSION < "1.9"
# ...
else
# Encoding patch
require 'socket'
class TCPSocket
def external_encoding
Encoding::BINARY
end
end
require 'rexml/source'
class REXML::IOSource
alias_method :encoding_assign, :encoding=
def encoding=(value)
encoding_assign(value) if value
end
end
begin
# OpenSSL is optional and can be missing
require 'openssl'
class OpenSSL::SSL::SSLSocket
def external_encoding
Encoding::BINARY
end
end
rescue
end
end
monkey patch. But then more and more problems appeared (not affiliated with encoding but nvl) so I decided to rewrite the whole thing
Which part did you rewrite? Is it open source?
I've just recently started and have not much to commit yet. Will create the repository later.
Hi, this seems related. Is there some workaround?
#<Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>
/usr/lib/ruby/1.9.1/rexml/source.rb:212:in 'match'
/usr/lib/ruby/1.9.1/rexml/source.rb:212:in 'match'
/usr/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:425:in 'pull'
/usr/lib/ruby/1.9.1/rexml/parsers/sax2parser.rb:92:in 'parse'
/home/brian/.gem/gems/xmpp4r-0.5/lib/xmpp4r/streamparser.rb:79:in 'parse'
/home/brian/.gem/gems/xmpp4r-0.5/lib/xmpp4r/stream.rb:75:in 'block in start'
...
Exception parsing
Line:
Position: 0
Last 80 unconsumed characters:
范</body>
@bsl not to my knowledge, unfortunately. I think what we need at this point is an actively maintained fork of the xmpp4r library.
btw, didn't the patch from https://github.com/ln/xmpp4r/issues/3#issuecomment-1739952 help?
@dotdoom It does seem to fix it. Thank you!
@dotdoom, works great for me. Thanks for the patch.
@dodoom it works. thanks. only wait to be merged in. Is this project dead?
The latest update was more than one year ago, so yes, it looks like it's dead... Maybe one of the forks is in better shape.
This recently bit me, so I started working on a fork with better 1.9 compatibility at https://github.com/hoxworth/xmpp4r. Right now I simply replaced the stream parser with a Nokogiri SAX parser, which handles character encodings far better. I'd like to pull out as much REXML as possible, but XMPP4R was pretty tightly coupled to REXML.
@hoxworth A while back I also started working on modernizing xmpp4r, https://github.com/whitehat101/xmpp4r, I merged most of the pulls and hacked some. I've been using that branch in "production" for months, and the only unresolved issue I'm bothered by is the utf-8 crashes. I'll check your stuff out, when I get a chance, and you might want to see mine.
I missed the monkeypatch in this issue b/c I only looked at pulls.
awesome, @whitehat101, I'll definitely take a look. didn't really like the monkey patch myself, and only worked for half of my use cases. we've been using my nokogiri patch for a while now with numerous utf-8 xmpp sources, and haven't had a crash since.
Wow, @dotdoom, your code saved my life, thanks! :+1:
@dotdoom oh!!! Good!!! you are so generous!! oh you save me and my money. Thank you so much!