docx icon indicating copy to clipboard operation
docx copied to clipboard

Unable to read Binary String

Open Jasmeet2011 opened this issue 5 years ago • 8 comments

Describe the bug

I am reading a Docx file saved as Blob field in Mysql database. The output from the Mysql table is in the form of a Binary String as extracted from "Event" of Logstash. I am able to write the binary string to a file and then read it using Docx. However, if i pass the data directly to Docx, it gives error.

To Reproduce

Steps to reproduce the behavior or put a short code to reproduce the bug.

example

require 'docx'
# I WRITE THE BINARY STRING TO A DOCX FILE AND READ IT
File.binwrite('c:\path\filename.doc', event.get('Blob field'))
doc = Docx::Document.new('/path/to/your/docx/filename.docx')
#ERROR--THIS DOES NOT WORK
doc = Docx::Document.new('event.get('Blob field'))
# TRIED TO CONVERT THE DATA TO A STRINGIO, BUT DID NOT WORK
file_to_read=StringIO.New(event.get('Blob field'))
doc = Docx::Document.new(file_to_read)

## Expected behavior

Is there a way to pass stringIO directly to Docx or any other way around to circumvent writing the file to Disk and then reading it.
Sorry for the wrong Label

## Environment
- Ruby version: [e.g 2.7.1]
- `docx` gem version: [e.g 0.5.0]
- Windows

Jasmeet2011 avatar May 25 '20 05:05 Jasmeet2011

What event.get('Blob field') returns exactly ?

WaKeMaTTa avatar May 25 '20 08:05 WaKeMaTTa

Thanks for the response. As per the documentation of Logstash Syntax: event.get(field) Returns: Value for this field or nil if the field does not exist. Returned values could be a string, numeric or timestamp scalar value.

  • In my case, the field is a Blob stored in Mysql table. According to definition of Blob:

BLOB values are treated as binary strings (byte strings). They have the binary character set and collation, and comparison and sorting are based on the numeric values of the bytes in column values. So event.get('Blob field') should return binary strings

Jasmeet2011 avatar May 25 '20 15:05 Jasmeet2011

@Jasmeet2011 can you provide a sample of your "binary string" ?

WaKeMaTTa avatar May 25 '20 16:05 WaKeMaTTa

I can read the binary string and write it as a Word Document. I can send the Word doc as read from the Event API however the binary string when written as a file using File.binwrite('new.docx',event.get('resume')) #Where 'resume' is the field containing the Blob.(url) can be read using Docx. I don't know of any other way to copy the Binary string. Pl suggest. new.docx Copy of the file

Jasmeet2011 avatar May 25 '20 16:05 Jasmeet2011

So i managed to view part of the Blob data content #<Sequel::SQL::Blob:0x840 bytes=7093 start="PK\x03\x04\x14\x00\b\b\b\x00" end="<\x02\x00\x00c\x19\x00\x00\x00\x

Jasmeet2011 avatar Jun 07 '20 06:06 Jasmeet2011

#ERROR--THIS DOES NOT WORK
doc = Docx::Document.new('event.get('Blob field'))

@Jasmeet2011 could you give us the error messages and backtraces appearing at this line?

satoryu avatar Jun 21 '20 23:06 satoryu

sure, i will revert

Jasmeet2011 avatar Jun 24 '20 07:06 Jasmeet2011

THIS DOES NOT WORK doc= Docx::Document.new(event.get('resume'))

This is the Error I receive:

`][ERROR][logstash.filters.ruby    ] Ruby exception occurred: string contains null byte'

'C:/Users/sun/Downloads/elk/logstash-6.8.0/vendor/bundle/jruby/2.5.0/gems/awesome_print-1.7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated'
'{
    "first_name" => "Janine ",
           "dob" => 1980-01-03T18:30:00.000Z,
          "tags" => [
        [0] "_rubyexception"
    ],
         "email" => "[email protected]\r",
      "@version" => "1",
    "@timestamp" => 2020-06-27T05:33:01.018Z,
            "id" => 4,
     "last_name" => "Labrune",
         "phone" => "(406) 785-5588",
          "type" => "docx",
        "resume" => #<Sequel::SQL::Blob:0x80a bytes=5568 start="PK\x03\x04\x14\x00\b\b\b\x00" end="<\x02\x00\x00n\x13\x00\x00\x00\x00">
}`

Jasmeet2011 avatar Jun 27 '20 05:06 Jasmeet2011