scala-xml
scala-xml copied to clipboard
ConstructingParser does not tolerate start of file whitespace
We use the constructing parser so as to get file/line/column information added to parsed XML, as well as for proper handling of CDATA regions.
However, we've encountered some things where we have had to add flexibility.
In particular we discovered that it requires the first character of an XML file to be "<" starting either an XML prolog or a comment, DTD, or the root element.
We have numerous XML files that begin with whitespace. E.g., a blank line, after which are comments, other ProcInstrs, etc.
We also have numerous XML files that begin with "<?xml" but where that is NOT an XML Prolog. As in
<?xml-model href="...." ... ?>
These things are all tolerated by standard Xerces.
So we've enhanced the constructing parser to be tolerant of these things.
Our constructing parser method overloads are all in this file:
project: https://github.com/apache/daffodil
file: daffodil-lib/src/main/scala/org/apache/daffodil/xml/DaffodilConstructingLoader.scala
I can create a PR with suggested changes, but before doing so wanted to run the whole idea past the maintainers of scala-xml. Is there a reason it should not be enhanced in this way?
I don't see why it shouldn't be able to be supported. Please feel free to submit a PR.