kafka-connect-transform-xml icon indicating copy to clipboard operation
kafka-connect-transform-xml copied to clipboard

ERROR: White spaces are required between publicId and systemId

Open rmoff opened this issue 4 years ago • 2 comments

Version 0.1.0.18

Installed using:

confluent-hub install --no-prompt jcustenborder/kafka-connect-transform-xml:0.1.0.18

Config:

curl -i -X PUT -H  "Content-Type:application/json" http://localhost:8083/connectors/source-file-01/config \
    -d '{
    "connector.class": "org.apache.kafka.connect.file.FileStreamSourceConnector",
    "tasks.max": "1",
    "file": "/tmp.xml",
    "topic": "xmltest",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "org.apache.kafka.connect.storage.StringConverter",
        "transforms": "xml",
        "transforms.xml.type": "com.github.jcustenborder.kafka.connect.transform.xml.FromXml$Value",
        "transforms.xml.schema.path": "http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd"
    }'

Transform failed with error org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.

[2020-09-08 14:08:41,647] INFO [source-file-01|task-0] FromXmlConfig values:
   package = com.github.jcustenborder.kafka.connect.transform.xml.model
   schema.path = [http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd]
   xjc.options.automatic.name.conflict.resolution.enabled = false
   xjc.options.strict.check.enabled = true
   xjc.options.verbose.enabled = false
 (com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:347)
[2020-09-08 14:08:41,699] INFO [source-file-01|task-0] compileContext() - Generating source for http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:99)
[2020-09-08 14:08:42,278] ERROR [source-file-01|task-0] fatalError (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:36)
org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
   at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
   at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanExternalID(XMLScanner.java:1072)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.scanDoctypeDecl(XMLDocumentScannerImpl.java:642)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:924)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
   at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
   at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
   at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:395)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:275)
   at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.parseSchema(SchemaCompilerImpl.java:158)
   at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:103)
   at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
   at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:264)
   at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:515)
   at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:467)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1186)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:127)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1201)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1197)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)
org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:395)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:275)
   at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.parseSchema(SchemaCompilerImpl.java:158)
   at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:103)
   at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)
   at org.apache.kafka.connect.runtime.ConnectorConfig.transformations(ConnectorConfig.java:264)
   at org.apache.kafka.connect.runtime.Worker.buildWorkerTask(Worker.java:515)
   at org.apache.kafka.connect.runtime.Worker.startTask(Worker.java:467)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.startTask(DistributedHerder.java:1186)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder.access$1600(DistributedHerder.java:127)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1201)
   at org.apache.kafka.connect.runtime.distributed.DistributedHerder$12.call(DistributedHerder.java:1197)
   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   at java.lang.Thread.run(Thread.java:748)

Source XML file:

tmp.xml.zip

rmoff avatar Sep 08 '20 14:09 rmoff

Tried tweaking a couple of the exposed config values, but got the same error.

[2020-09-08 14:19:04,531] INFO [source-file-01c|task-0] FromXmlConfig values:
   package = com.github.jcustenborder.kafka.connect.transform.xml.model
   schema.path = [http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd]
   xjc.options.automatic.name.conflict.resolution.enabled = false
   xjc.options.strict.check.enabled = true
   xjc.options.verbose.enabled = false
 (com.github.jcustenborder.kafka.connect.transform.xml.FromXmlConfig:347)
[2020-09-08 14:19:04,533] INFO [source-file-01c|task-0] compileContext() - Generating source for http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:99)
org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
   at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1239)
[2020-09-08 14:19:04,846] ERROR [source-file-01c|task-0] fatalError (com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler:36)
   at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:395)
org.xml.sax.SAXParseException; systemId: http://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd; lineNumber: 1; columnNumber: 50; White spaces are required between publicId and systemId.
   at com.sun.tools.xjc.reader.internalizer.DOMForest.parse(DOMForest.java:275)
   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
   at com.sun.tools.xjc.api.impl.s2j.SchemaCompilerImpl.parseSchema(SchemaCompilerImpl.java:158)
   at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
   at com.github.jcustenborder.kafka.connect.transform.xml.XSDCompiler.compileContext(XSDCompiler.java:103)
   at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
   at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
   at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanExternalID(XMLScanner.java:1072)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.scanDoctypeDecl(XMLDocumentScannerImpl.java:642)
   at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:924)
   at com.github.jcustenborder.kafka.connect.transform.xml.FromXml.configure(FromXml.java:130)

rmoff avatar Sep 08 '20 14:09 rmoff

This one is a weird one. It looks like it gets angry when there is a 301 redirect. Moving to

schema.path = https://datex2.eu/schema/1_0/1_0/DATEXIISchema_1_0_1_0.xsd

got me to the point that it would load the xsd. The next problem is this schema defines two Comment elements which angers it again. I'm going to add support to control some of this output.

#32

jcustenborder avatar Sep 10 '20 18:09 jcustenborder