ruby-ldap icon indicating copy to clipboard operation
ruby-ldap copied to clipboard

:< separator in LDIF not being parsed correctly

Open twiz718 opened this issue 12 years ago • 1 comments

dn: MYDNHERE
sn: Khanin
givenName: Alex
whenCreated: 20080910232037.0Z
displayName: Khanin, Alex
department: MYDEPTHERE
sAMAccountName: myloginhere
mail: MYEMAILHERE
manager: MYMGRDNHERE
thumbnailPhoto:< file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY

This file:///var/tmp/ldapsearch-thumbnailPhoto-S8oDGY exists and is readable (contains JPEG data).

If you try to run a LDAP::LDIF.parse_file() on this ldif you get the following error:

from script/rails:6:in `(root)'irb(main):004:0> LDAP::LDIF.parse_file("/var/tmp/akhanin.ldif")
ArgumentError: invalid byte sequence in UTF-8
from org/jruby/RubyRegexp.java:1487:in `=~'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:105:in `unsafe_char?'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:323:in `parse_entry'
from org/jruby/RubyArray.java:1613:in `each'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:184:in `parse_entry'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:481:in `parse_file'
from org/jruby/RubyIO.java:1183:in `open'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/jruby-ldap-0.0.2/lib/ldap/ldif.rb:439:in `parse_file'
from (irb):4:in `evaluate'
from org/jruby/RubyKernel.java:1066:in `eval'
from org/jruby/RubyKernel.java:1392:in `loop'
from org/jruby/RubyKernel.java:1174:in `catch'
from org/jruby/RubyKernel.java:1174:in `catch'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:47:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands/console.rb:8:in `start'
from /Users/akhanin/.rvm/gems/jruby-1.7.2@backyard/gems/railties-3.2.11/lib/rails/commands.rb:41:in `(root)'
from org/jruby/RubyKernel.java:1027:in `require'
from script/rails:6:in `(root)'irb(main):005:0> 

When I run "file" on that thumbnailPhoto I get the following: ldapsearch-thumbnailPhoto-S8oDGY: JPEG image data, JFIF standard 1.01

Now if I remove the last line in the ldif (with the thumbnail ":<" reference), it parses just fine.

twiz718 avatar Feb 04 '13 23:02 twiz718

The problem is that ruby-ldap was not written to work with UTF-8, and method unsafe_char? fails when parsing a file

# return *true* if +str+ contains a character with an ASCII value > 127 or
# a NUL, LF or CR. Otherwise, *false* is returned.
#
def LDIF.unsafe_char?( str )
  # This could be written as a single regex, but this is faster.
  str =~ /^[ :]/ || str =~ /[\x00-\x1f\x7f-\xff]/
end

Wikipedia:

ASCII was incorporated into the Unicode character set as the first 128 symbols, so the ASCII characters have the same numeric codes in both sets. This allows UTF-8 to be backward compatible with ASCII, a significant advantage.

so, sequence \x00-\x1f is correct and pass, but \x7f-\xff is invalid in UTF-8 and should be replaced to another one or even few sequences, but I do not know on which exactly

Patches are welcome.

ghost avatar Feb 05 '13 13:02 ghost