RSS-Parser
RSS-Parser copied to clipboard
Parsing feed fails if it has html encoded characters
Describe the bug
I tried to parse the feed https://myrskyla.fi/feed/ but it contains in a title tag Ä
instead of Ä
which then leads to exceptions and failing to parse feed both on android and ios side.
android:
RssParsingException(message=Something went wrong during the parsing of the feed. Please check if the XML is valid, cause=org.xmlpull.v1.XmlPullParserException: unresolved: ä (position:TEXT @11:22 in java.io.InputStreamReader@4290534) )
at com.prof18.rssparser.internal.AndroidXmlParser$parseXML$2.invokeSuspend(AndroidXmlParser.kt:67)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693)
Caused by: org.xmlpull.v1.XmlPullParserException: unresolved: ä (position:TEXT @11:22 in java.io.InputStreamReader@4290534)
at com.android.org.kxml2.io.KXmlParser.checkRelaxed(KXmlParser.java:305)
at com.android.org.kxml2.io.KXmlParser.readEntity(KXmlParser.java:1285)
at com.android.org.kxml2.io.KXmlParser.readValue(KXmlParser.java:1402)
at com.android.org.kxml2.io.KXmlParser.next(KXmlParser.java:393)
at com.android.org.kxml2.io.KXmlParser.next(KXmlParser.java:313)
at com.android.org.kxml2.io.KXmlParser.nextText(KXmlParser.java:2077)
at com.prof18.rssparser.internal.XmlPullParser_Kt.nextTrimmedText(XmlPullParser+.kt:5)
at com.prof18.rssparser.internal.rss.RssParserKt.extractRSSContent(RssParser.kt:289)
at com.prof18.rssparser.internal.AndroidXmlParser$parseXML$2.invokeSuspend(AndroidXmlParser.kt:54)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:104)
at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:111)
at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:99)
at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:585)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:802)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:706)
at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:693)
ios:
0 composeui 0x10e50c5d7 kfun:kotlin.Throwable#<init>(){} + 95 (/opt/buildAgent/work/b2e1db4d8d903ca4/kotlin/kotlin-native/runtime/src/main/kotlin/kotlin/Throwable.kt:32:28)
1 composeui 0x10e50589f kfun:kotlin.Exception#<init>(){} + 87 (/opt/buildAgent/work/b2e1db4d8d903ca4/kotlin/kotlin-native/runtime/src/main/kotlin/kotlin/Exceptions.kt:21:35)
2 composeui 0x110063c33 kfun:com.prof18.rssparser.exception.RssParsingException#<init>(kotlin.String?;kotlin.Throwable?){} + 107 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/commonMain/kotlin/com/prof18/rssparser/exception/RssParsingException.kt:12:5)
3 composeui 0x11008ed37 kfun:com.prof18.rssparser.internal.IosXmlParser.parseXML$lambda$3$lambda$1#internal + 299 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/iosMain/kotlin/com/prof18/rssparser/internal/IosXmlParser.kt:32:33)
4 composeui 0x11008fc37 kfun:com.prof18.rssparser.internal.IosXmlParser.$parseXML$lambda$3$lambda$1$FUNCTION_REFERENCE$2.invoke#internal + 103 (/Users/runner/work/RSS-Parser/RSS-Parser/rssparser/src/iosMain/kotlin/com/prof18/rssparser/internal/IosXmlParser.kt:26:13)
The link of the RSS Feed https://myrskyla.fi/feed/
I was able to fix it by replacing this (and some more likely offending chars http://www.javascripter.net/faq/accentedcharacters.htm) manually:
val feedString = xmlFetcher.fetchXmlAsString(url)
val feedStringFixed = feedString
.replace("& auml;", "Ä")
.replace("& Ouml;", "Ö")
val channel = parser.parse(feedStringFixed)
But i needed to fetch the feed myself because built-in XmlFetcher is internal class. So would be good to
- try unescaping chars if parsing fails or/and making XmlFetcher interface accessible
- add possibility to override or use XmlFetcher.