ruby-ldap
ruby-ldap copied to clipboard
:< separator in LDIF not being parsed correctly
dn: MYDNHERE
sn: Khanin
givenName: Alex
whenCreated: 20080910232037.0Z
displayName: Khanin, Alex
department: MYDEPTHERE
sAMAccountName: myloginhere
mail: MYEMAILHERE
manager: MYMGRDNHERE
thumbnailPhoto:< file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY
This file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY exists and is readable (contains JPEG data).
If you try to run a LDAP::LDIF.parse_file()
on this ldif you get the following error:
from script/rails:6:in `(root)'irb(main):004:0> LDAP::LDIF.parse_file("/var/tmp/akhanin.ldif")
ArgumentError: invalid byte sequence in UTF-8
from org/jruby/RubyRegexp.java:1487:in `=~'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:105:in `unsafe_char?'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:323:in `parse_entry'
from org/jruby/RubyArray.java:1613:in `each'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:184:in `parse_entry'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:481:in `parse_file'
from org/jruby/RubyIO.java:1183:in `open'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:439:in `parse_file'
from (irb):4:in `evaluate'
from org/jruby/RubyKernel.java:1066:in `eval'
from org/jruby/RubyKernel.java:1392:in `loop'
from org/jruby/RubyKernel.java:1174:in `catch'
from org/jruby/RubyKernel.java:1174:in `catch'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:47:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:8:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands.rb:41:in `(root)'
from org/jruby/RubyKernel.java:1027:in `require'
from script/rails:6:in `(root)'irb(main):005:0>
When I run "file" on that thumbnailPhoto I get the following:
ldapsearch-thumbnailPhoto-S8oDGY: JPEG image data, JFIF standard 1.01
Now if I remove the last line in the ldif (with the thumbnail ":<" reference), it parses just fine.
The problem is that ruby-ldap was not written to work with UTF-8, and method unsafe_char?
fails when parsing a file
# return *true* if +str+ contains a character with an ASCII value > 127 or
# a NUL, LF or CR. Otherwise, *false* is returned.
#
def LDIF.unsafe_char?( str )
# This could be written as a single regex, but this is faster.
str =~ /^[ :]/ || str =~ /[\x00-\x1f\x7f-\xff]/
end
Wikipedia:
ASCII was incorporated into the Unicode character set as the first 128 symbols, so the ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with ASCII, a significant advantage.
so, sequence \x00-\x1f
is correct and pass, but \x7f-\xff
is invalid in UTF-8 and should be replaced to another one or even few sequences, but I do not know on which exactly
Patches are welcome.