kotlin-toolkit
kotlin-toolkit copied to clipboard
Valid Epub cannot be parsed: "java.lang.Exception: Unable to find an OPF file".
Bug Report
What happened?
When trying to open the attached epub we are seeing the following stacktrace:
Caused by: java.lang.Exception: Unable to find an OPF file.
at org.readium.r2.streamer.parser.epub.EpubParser.getRootFilePath(EpubParser.kt:175)
at org.readium.r2.streamer.parser.epub.EpubParser._parse(EpubParser.kt:98)
at org.readium.r2.streamer.parser.epub.EpubParser.parse(EpubParser.kt:90)
at org.readium.r2.streamer.Streamer$open$builder$1.invokeSuspend(Streamer.kt:126)
... 24 common frames omitted
Comparing this epub to others that can be parsed successfully with Readium, I noticed that the META-INF/container.xml file has no line breaks. I know that the absence of line-breaks doesn't make it invalid XML, I'm wondering if perhaps there's an implicit assumption in the parser that the xml should have linebreaks.
Expected behavior
It should be able to parse the epub since I can open it without a problem with other tools such as Books in OSX.
How to reproduce?
Attempt to parse the broken.epub using org.readium.r2.streamer.parser.epub.EpubParser.parse
Environment
- Readium version: 2.2.0
Development environment
android_build_tools_version = '30.0.2' android_compile_sdk_version = 29 android_target_sdk_version = 29 android_min_sdk_version = 21
Testing device
- Android version:
- Model:
- Is it an emulator? No
Additional context
- Are you willing to fix the problem and contribute a pull request? Yes or No With guidance I may be able to do that.
I can't reproduce the error from the test app.
We use an XML parser so whitespaces don't matter, however there is an extra invisible character in META-INF/container.xml
before the opening <?xml
, maybe that's what tripping up the parser in your case.
$ xxd META-INF/container.xml
00000000: efbb bf3c 3f78 6d6c 2076 6572 7369 6f6e ...<?xml version
00000010: 3d22 312e 3022 2065 6e63 6f64 696e 673d ="1.0" encoding=
00000020: 2275 7466 2d38 223f 3e3c 636f 6e74 6169 "utf-8"?><contai
00000030: 6e65 7220 7665 7273 696f 6e3d 2231 2e30 ner version="1.0
00000040: 2220 786d 6c6e 733d 2275 726e 3a6f 6173 " xmlns="urn:oas
00000050: 6973 3a6e 616d 6573 3a74 633a 6f70 656e is:names:tc:open
00000060: 646f 6375 6d65 6e74 3a78 6d6c 6e73 3a63 document:xmlns:c
00000070: 6f6e 7461 696e 6572 223e 3c72 6f6f 7466 ontainer"><rootf
00000080: 696c 6573 3e3c 726f 6f74 6669 6c65 2066 iles><rootfile f
00000090: 756c 6c2d 7061 7468 3d22 4f45 4250 532f ull-path="OEBPS/
000000a0: 636f 6e74 656e 742e 6f70 6622 206d 6564 content.opf" med
000000b0: 6961 2d74 7970 653d 2261 7070 6c69 6361 ia-type="applica
000000c0: 7469 6f6e 2f6f 6562 7073 2d70 6163 6b61 tion/oebps-packa
000000d0: 6765 2b78 6d6c 2220 2f3e 3c2f 726f 6f74 ge+xml" /></root
000000e0: 6669 6c65 733e 3c2f 636f 6e74 6169 6e65 files></containe
000000f0: 723e 0a r>.