java-mammoth
java-mammoth copied to clipboard
Convert Word documents to simple and clean HTML
Please help to fix for Android, there is an exception Caused by: org.xml.sax.SAXNotRecognizedException: http://apache.org/xml/features/disallow-doctype-decl at org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93) java.lang.RuntimeException: org.xml.sax.SAXNotRecognizedException: http://apache.org/xml/features/disallow-doctype-decl at org.zwobble.mammoth.internal.xml.parsing.SimpleSax.parseInputSource(SimpleSax.java:67) at org.zwobble.mammoth.internal.xml.parsing.SimpleSax.parseStream(SimpleSax.java:24) at org.zwobble.mammoth.internal.xml.parsing.XmlParser.parseStream(XmlParser.java:24) at org.zwobble.mammoth.internal.docx.OfficeXml.parseXml(OfficeXml.java:38)
Exception in thread "main" java.lang.IllegalStateException: Duplicate key org.zwobble.mammoth.internal.docx.Numbering$AbstractNumLevel@72bef795 at java.util.stream.Collectors.lambda$throwingMerger$0(Collectors.java:133) at java.util.HashMap.merge(HashMap.java:1254) at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320) at java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169) at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175) at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1625) at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:580) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:270) at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1625) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)...
I am using java 8, mammoth version 1.5.0. I have a list - A. Option A B. Option B It is getting converted into 1. Option A 2. Option B
Does your library support *.docx revision history?
Hi, When the doc file is too large and multiple parsing tasks are in progress at the same time. GC overrun limit exceeded or java heap space will appear. Is...
I have the following Word document: [2Lists.docx](https://github.com/mwilliamson/java-mammoth/files/9535848/2Lists.docx):  When I convert it to HTML, the lists are merged: ` Li1 Li2 NewLi1 NewLi2 ` This is the Word internal structure:...
I have the following Word document: [list-2.docx](https://github.com/mwilliamson/java-mammoth/files/9535026/list-2.docx):  When I convert it to HTML I obtain this structure: ` LI1 Content Li2 ` I expected to have only one list...
Hi, Some cross-references are not converted in anchors. I attached a sample: [sample.docx](https://github.com/mwilliamson/java-mammoth/files/6109338/sample.docx) The result of the conversion should be: ``` Title Reference to: Heading Heading Content1 ``` but the...
Hi there. Just a short question: Is there any chance to get width and height setting of images from the word document? Right now images are imported in original size...
As it stands, the API transforms every single line inside a text box into individual p 's It would be great if there was a div having those converted p's...