MWFeedParser icon indicating copy to clipboard operation
MWFeedParser copied to clipboard

Incorrect text encoding with feeds containing "euc-kr" text encoding

Open sylverb opened this issue 13 years ago • 1 comments

Hello, I had issues with some specific text encoding when the http headers were not indicating the text encoding (like this one for example : http://www.torrentrg.com/bbs/rss.php?bo_table=torrent_variety ). To fix this, I decided to get the encoding type from the XML declaration if we don't get it from http headers :

<?xml version="1.0" encoding="euc-kr"?>

If you want to check about this, here is my code ... in MWFeedParser.m / - (void)startParsingData:(NSData *)data textEncodingName:(NSString *)textEncodingName :

        [...]
        // Not UTF-8 so convert
        MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
        NSString *string = nil;

        // Attempt to detect encoding from response header
        NSStringEncoding nsEncoding = 0;
        [...]

becomes :

        [...]
        // Not UTF-8 so convert
        MWLog(@"MWFeedParser: XML document was not UTF-8 so we're converting it");
        NSString *string = nil;

        // If no text encoding indication was in the response header
        // then try to get encoding from the XML declaration
        if (textEncodingName == nil) {
            NSData* xmlEncodingData = [NSData dataWithBytesNoCopy:(void *)[data bytes]
                                                           length:100
                                                     freeWhenDone:NO];
            NSString* xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSUTF8StringEncoding];
            if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSISOLatin1StringEncoding];
            if (!xmlEncodingString) xmlEncodingString = [[NSString alloc] initWithData:xmlEncodingData encoding:NSMacOSRomanStringEncoding];

            if ([xmlEncodingString hasPrefix:@"<?xml"]) {
                NSRange a = [xmlEncodingString rangeOfString:@"?>"];
                if (a.location != NSNotFound) {
                    NSString *xmlDec = [xmlEncodingString substringToIndex:a.location];
                    NSRange b = [xmlDec rangeOfString:@"encoding=\""];
                    if (b.location != NSNotFound) {
                        NSUInteger s = b.location+b.length;
                        NSRange c = [xmlDec rangeOfString:@"\"" options:0 range:NSMakeRange(s, [xmlDec length] - s)];
                        if (c.location != NSNotFound) {
                            textEncodingName = [xmlEncodingString substringWithRange:NSMakeRange(b.location+b.length,c.location-b.location-b.length)];
                        }
                    }
                }
            }
            [xmlEncodingString release];
        }

        // Attempt to detect encoding from response header or XML declaration
        NSStringEncoding nsEncoding = 0;
        [...]

sylverb avatar Nov 16 '11 23:11 sylverb

Thanks! The above code fix the problem.

dodyw avatar Apr 12 '12 10:04 dodyw