Make DTD loading in XmlDecoder optional
Currently, the XmlDecoder loads referenced DTDs and fails on broken links. It would be useful when automatic DTD loading becomes a configurable XmlDecoder option.
I encountered this problem as well ! Thanks.
@liowalter Hi Lionel,
I played a little bit around and can now provide a solution which works but has not the status it should have in the end (I think)
You can find the code in [1]
I had to copy and paste the complete implementation of MF core [2] because the type is declared as final.
As you can see in [1] it's really simple. Now Entities aren't handled at all - the code returns an empty string which is obviously the same as you do it right now with sed where you remove the DTD prolog.
My Idea:
- provide a configuration (e.g. a root directory with optional subdirectories) where all the possible DTDs should be put in.
- the new type then looks up for a requested file by the parser in all the directories
- if nothing is found either an empty string is returned (as it is done now in my new implementation) or an exception will be thrown. This should be configurable as well. I think we should discuss which might be the most obvious solution.
- Once we have implemented something I would make a pull request.
I will send you the jar you have to put into the plugins directory of your repository. Then you have to start the flux script with the property -Dflux.pluginsdir=[absolute path to the plugins dir] Might be that this is done by the MF-Runner repository if it is deployed correctly (I'm not sure)
Use the new Flux command generic-xml-handle-dtd ("article") | //"article" is the record delimiter //handle-generic-xml ("article") | //"article" is the record delimiter in your Flux script
Sorry - unfortunately I haven't worked with MF since some weeks. Then I always need some time to get into it... But I want to be more steadily in the future!
Günter
[1] https://github.com/guenterh/nlmfCommands/blob/master/src/main/java/org/swissbib/mf/stream/converter/xml/GenericXmlDTDHandler.java#L107 [2] https://github.com/culturegraph/metafacture-core/blob/master/src/main/java/org/culturegraph/mf/stream/converter/xml/GenericXmlHandler.java#L39
Currently, the XmlDecoder loads referenced DTDs and fails on broken links. @thomasseidel can you probably show me the code, where this handling is done? - from my knowledge, the resolveEntity method implementation in DefaultXmlPipe returns null, see https://github.com/culturegraph/metafacture-core/blob/master/src/main/java/org/culturegraph/mf/framework/DefaultXmlPipe.java#L114