Kanna
Kanna copied to clipboard
HTML of Kanna instance stripped down on Xcode 12
Description:
I am seeing some strange behavior that I have not seen before. If I create a Kanna instance and then access the .toHTML property of that instance, it does not return the full HTML of the instance. The weird thing is that it is different depending on the deployment target. When building for iOS 14 the .toHTML property returns almost all the raw data, but if I am building for iOS 13.5 it returns only a small portion. I noticed the issue when starting to use Xcode 12 and its toolchain.
Installation method:
- [ X] CocoaPods
Kanna version:
5.2.2
swift --version:
5.3
Xcode version:
Version 12.0 (12A7209)
How to reproduce:
Create a Kanna instance with some HTML:
let document = try! Kanna.HTML(html: htmlText, encoding: String.Encoding.utf8)
print(document.toHTML)
When looking at the console log you can see that the HTML is not complete.
Anyone else seeing this?
Regards, Erik
After further investigation using breakpoints I am able to see in the debugger that the corresponding html property of the libxmlHTMLDocument object includes the whole html source. However, looking at the HTMLDocument and its content (or toHTML) it is missing most of the source. As weird as it is, on iOS 13 and down it is missing a lot more than on iOS 14.
@fishfisher Thank you for your report.
There seems to be a problem with libxml2
in iOS 13.5. (Kanna dependend on libxml2.)
I will not address it because it's an issue with a specific version of iOS.
If you have specific problems with this issue, please let us know.
Thanks
[note] I ran the following code to isolate the problem with either Kanna or libxml2.
#include <libxml/parser.h>
#include <libxml/xpath.h>
#include <libxml/HTMLtree.h>
NSData* data = [NSData dataWithContentsOfURL: [NSURL URLWithString:@"https://github.com/tid-kijyun/Kanna/issues/239"]];
xmlDoc* doc = htmlReadDoc((xmlChar*)[data bytes], NULL, NULL, HTML_PARSE_RECOVER | HTML_PARSE_NOERROR | HTML_PARSE_NOWARNING);
xmlBufferPtr buff = xmlBufferCreate();
xmlOutputBufferPtr outputBuff = xmlOutputBufferCreateBuffer(buff, NULL);
htmlDocContentDumpOutput(outputBuff, doc, "utf8");
NSLog(@"%s", xmlOutputBufferGetContent(outputBuff)); //< There seems to be a problem with xmlOutputBufferGetContent in iOS 13.5.
xmlOutputBufferClose(outputBuff);
xmlBufferFree(buff);
This is code using only libxml2, but I've seen the problem reproduced in iOS 13.5. Therefore, I have no way to correct it.
Hi @tid-kijyun . Thanks for looking into this. I am actually seeing this issue on any iOS below iOS 14 when building with Xcode 12. I did not see this issues using Xcode 11.
For me it is reproducible every time when building for any iOS other than iOS 14 on Xcode 12:
let someLargeHTMLString: String
let document = try! Kanna.HTML(html: someLargeHTMLString, encoding: String.Encoding.utf8)
print(document.toHTML)
Let me know if I can help with anything or if I can provide more information.
Brgds, Erik
@fishfisher Thank you for the additional information. You're right, this is an issue that occurs when using libxml2 on Xcode 12/iOS 13.5. It's not a problem in Xcode 11.
Well, as you can see from my verification code, this is a problem with Xcode 12/iOS 13.5 and libxml2, not caused by Kanna. All we can do is find a workaround or report it to Apple and wait for Xcode to fix it. If you're having trouble with this issue and want us to do something about it, I'll find a workaround, but for now I'll wait for Apple to fix it.
Thanks
@tid-kijyun Very understandable - thanks for taking your time to check!
This should be closed? @tid-kijyun Sorry for the ping