metafacture-core icon indicating copy to clipboard operation
metafacture-core copied to clipboard

Make DTD loading in XmlDecoder optional

Open thomasseidel opened this issue 10 years ago • 3 comments

Currently, the XmlDecoder loads referenced DTDs and fails on broken links. It would be useful when automatic DTD loading becomes a configurable XmlDecoder option.

thomasseidel avatar Jul 08 '15 15:07 thomasseidel

I encountered this problem as well ! Thanks.

liowalter avatar Feb 19 '16 13:02 liowalter

@liowalter Hi Lionel,

I played a little bit around and can now provide a solution which works but has not the status it should have in the end (I think)

You can find the code in [1]

I had to copy and paste the complete implementation of MF core [2] because the type is declared as final.

As you can see in [1] it's really simple. Now Entities aren't handled at all - the code returns an empty string which is obviously the same as you do it right now with sed where you remove the DTD prolog.

My Idea:

  • provide a configuration (e.g. a root directory with optional subdirectories) where all the possible DTDs should be put in.
  • the new type then looks up for a requested file by the parser in all the directories
  • if nothing is found either an empty string is returned (as it is done now in my new implementation) or an exception will be thrown. This should be configurable as well. I think we should discuss which might be the most obvious solution.
  • Once we have implemented something I would make a pull request.

I will send you the jar you have to put into the plugins directory of your repository. Then you have to start the flux script with the property -Dflux.pluginsdir=[absolute path to the plugins dir] Might be that this is done by the MF-Runner repository if it is deployed correctly (I'm not sure)

Use the new Flux command generic-xml-handle-dtd ("article") | //"article" is the record delimiter //handle-generic-xml ("article") | //"article" is the record delimiter in your Flux script

Sorry - unfortunately I haven't worked with MF since some weeks. Then I always need some time to get into it... But I want to be more steadily in the future!

Günter

[1] https://github.com/guenterh/nlmfCommands/blob/master/src/main/java/org/swissbib/mf/stream/converter/xml/GenericXmlDTDHandler.java#L107 [2] https://github.com/culturegraph/metafacture-core/blob/master/src/main/java/org/culturegraph/mf/stream/converter/xml/GenericXmlHandler.java#L39

guenterh avatar Feb 19 '16 19:02 guenterh

Currently, the XmlDecoder loads referenced DTDs and fails on broken links. @thomasseidel can you probably show me the code, where this handling is done? - from my knowledge, the resolveEntity method implementation in DefaultXmlPipe returns null, see https://github.com/culturegraph/metafacture-core/blob/master/src/main/java/org/culturegraph/mf/framework/DefaultXmlPipe.java#L114

boferri avatar Sep 06 '16 09:09 boferri