onix icon indicating copy to clipboard operation
onix copied to clipboard

Wont Read Onix Feed

Open acolchagoff opened this issue 10 years ago • 4 comments

Ive got an onix feed that is sent to me via a zip file in an email. The zip file contains a 100+ mb xml file and a dtd file. The top of the file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ONIXMessage SYSTEM
"ONIX_BookProduct_3.0_short.dtd">
<ONIXmessage release="3.0">
<header>
<sender>
<x298>Publisher</x298>
<x299>Vendor</x299>
<j272>[email protected]</j272>
</sender>
<x307>20140311</x307>
<m183>An Onix message file from Publisher</m183>
</header>

in spite of the fact that this file has well over 10,000 products in it, the gem wont read any of them.

reader.each do |product|
    puts product.inspect
end

The each loop does nothing, it never fires, its as if the XML file had zero products in it.

Ive spent several days here, heres the entire algorithm for reference:

def self.parse_onix(publisher_id, onix_file)
    Zip::ZipFile.open(onix_file.tempfile.path) do |zip|
        xml_file = ""
        dir = "#{Rails.root.to_s}/tmp/onix/"

        zip.each do |entry|
            next if entry.name =~ /__MACOSX/ or \
             entry.name =~ /\.DS_Store/ or !entry.file?
            logger.debug "#{entry.name}"
            puts entry.name
            FileUtils::mkdir_p(dir)
            #this_file = FileUtils.touch(dir + entry.name)
            entry.extract(dir + entry.name)

            p '--->Thing:'+entry.name.last(3)
            if entry.name.last(3) == 'xml'
                xml_file = dir + entry.name
            end
        end

        Work.fix_dtd_path(dir, xml_file)

        reader = ONIX::Reader.new(xml_file)

        puts reader.inspect

        reader.each do |product|
            puts product.inspect
        end
    end
end


def self.fix_dtd_path(dir, xml_file)
    xml = File.read(xml_file)

    # fix the path in the DOCTYPE
    dtd_file = 'ONIX_BookProduct_3.0_short.dtd'
    xml = xml.gsub(dtd_file, dir + dtd_file)
    File.delete(xml_file)
    File.open(xml_file, 'w') do |file|
        file.write(xml)
    end
end

acolchagoff avatar Mar 20 '14 22:03 acolchagoff

I am not sure what might the problem be, but can you try converting the ONIX file to reference tags:

ONIX::Normaliser.process("oldfile.xml", "newfile.xml")

If this converts it to reference tags, you should be able to iterate over the products in the file.

varunarang avatar Mar 21 '14 06:03 varunarang

Unfortunately Normalizing doesn't seem to help... but I think I've figured out the issue. It doesn't appear that this gem supports onix 3.0 short, which is what my xml feed is. because the feed is in short format, all of my tag names are different (for example, 'Header' becomes 'header', 'PublisherIDType' becomes 'x447' etc...) the gem is looking for standard tags and ignoring short tags.

Would this explain the issues i'm having?

acolchagoff avatar Mar 21 '14 14:03 acolchagoff

Making progress, I'm getting this error when calling normalize.

/var/folders/nb/nc2b5f2s7rdch1nxfxyd2d200000gq/T/onix20140331-4641-16q41ea:3: warning: failed to load external entity "/var/folders/nb/nc2b5f2s7rdch1nxfxyd2d200000gq/T/ONIX_BookProduct_3.0_short.dtd"
"ONIX_BookProduct_3.0_short.dtd">

The normalizer appears to be looking in the temp directory for my dtd file when it didn't move it there. The dtd file is still back in the zip folder.

acolchagoff avatar Mar 31 '14 19:03 acolchagoff

okay after manually copying 3 dtd files (interdependent dtd's?) Ive fixed that error, but he xslt conversion still seems to be failing, I think its because the xslt script distributed with the gem is for ONIX 2.1

acolchagoff avatar Mar 31 '14 20:03 acolchagoff