MWFeedParser
MWFeedParser copied to clipboard
Incorrect text encoding with feeds containing "euc-kr" text encoding
Hello, I had issues with some specific text encoding when the http headers were not indicating the text encoding (like this one for example : http://www.torrentrg.com/bbs/rss.php?bo_table=torrent_variety ). To fix this, I decided to get the encoding type from the XML declaration if we don't get it from http headers :
<?xml version="1.0" encoding="euc-kr"?>
If you want to check about this, here is my code ... in MWFeedParser.m / - (void)startParsingData:(NSData *)data textEncodingName:(NSString *)textEncodingName :
[...]
// Not UTF-8 so convert
MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
NSString *string = nil;
// Attempt to detect encoding from response header
NSStringEncoding nsEncoding = 0;
[...]
becomes :
[...]
// Not UTF-8 so convert
MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
NSString *string = nil;
// If no text encoding indication was in the response header
// then try to get encoding from the XML declaration
if (textEncodingName == nil) {
NSData* xmlEncodingData = [NSData dataWithBytesNoCopy:(void *)[data bytes]
length:100
freeWhenDone:NO];
NSString* xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSUTF8StringEncoding];
if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSISOLatin1StringEncoding];
if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSMacOSRomanStringEncoding];
if ([xmlEncodingString hasPrefix:@"<?xml"]) {
NSRange a = [xmlEncodingString rangeOfString:@"?>"];
if (a.location != NSNotFound) {
NSString *xmlDec = [xmlEncodingString substringToIndex:a.location];
NSRange b = [xmlDec rangeOfString:@"encoding=\""];
if (b.location != NSNotFound) {
NSUInteger s = b.location+b.length;
NSRange c = [xmlDec rangeOfString:@"\"" options:0 range:NSMakeRange(s, [xmlDec length] - s)];
if (c.location != NSNotFound) {
textEncodingName = [xmlEncodingString substringWithRange:NSMakeRange(b.location+b.length,c.location-b.location-b.length)];
}
}
}
}
[xmlEncodingString release];
}
// Attempt to detect encoding from response header or XML declaration
NSStringEncoding nsEncoding = 0;
[...]
Thanks! The above code fix the problem.