spaRSS
spaRSS copied to clipboard
Old entries appears as new
Hi!
I use the latest spaRSS version from f-droid. In some feeds (in particular livejournal.com blogs, but not only them) that were not updated for a long time (several months or years) old posts can appear as new (posted couple weeks ago or so) and unread. It happens randomly, but quite often, with both Atom and RSS feeds. pubDate tag in RSS or published/updated tags in Atom XML source are correct. Please tell me how can I help to debug this issue.
I faced same issue
There's an option in the settings that changes the sort of the entries. Have you tried to enable and disable it?
@moshpirit do you mean "oldest first" setting? It was disabled. How can it impact this issue?
OK, now I switched it on and off again.
Exactly. Maybe it can temporarily help, idk
Maybe it can temporarily help
It didn't.
I experience the same problem. I think this is due to the way SPARSS analyses the publication date of an item. The parser looks first at the 'pubdate' of an item. If no date is found, it falls back to the date of the previous item or it even falls back to the date the whole channel was built / updated! Certainly, this last datetime has nothing to do with the publication time of an item. So, this will lead to the unwanted result that old items sometimes are fetched with apparantly new dates and times.
If you agree that this is probably the cause of the error, my vote would be to delete those fall back scenario's from the RSSAtomParser.java. If no 'pubdate' is found, just discard the whole item and go to the next.
1st, plz point to the code you describe. 2nd how does tie proposition correlates to the standards? 3rd even if the standards share your point, how many users will get mad when their feed items start dropped?
I think this is due to the way SPARSS analyses the publication date of an item. The parser looks first at the 'pubdate' of an item. If no date is found, it falls back to the date of the previous item or it even falls back to the date the whole channel was built / updated!
It's not the case. As I already mentioned, messages with correct pubDate
/published
/updated
are affected too.
One example:
<item>
<title>Valuable asset leaving our LibreCAD team</title>
<link>http://blog.librecad.org/2016/05/valuable-asset-leaving-our-librecad-team/</link>
<comments>http://blog.librecad.org/2016/05/valuable-asset-leaving-our-librecad-team/#comments</comments>
<pubDate>Tue, 31 May 2016 19:40:23 +0000</pubDate>
<dc:creator><![CDATA[RvT]]></dc:creator>
<category><![CDATA[Help Wanted!]]></category>
<guid isPermaLink="false">http://blog.librecad.org/?p=432</guid>
This item regularly appears as new, the last time I marked it as read again was yesterday.
@ildar Thx, I think I mentioned the issue is in the RSSAtomParser.Java.
@mikhirev Thank you for your example. As I understand from your example, it happens to items with correctly configured pubDate-tags. That is my experience also. Existing items are presented with a new date/time. It never happens that an item is presented with a different title or image. Only the date/time gets sometimes mixed up. It is still my assumption that this is due to the way the RSSAtomParser handles mEntryDate. There are many fallback situations and maybe a reset to mEntryDate is not given where it should. Did you notice what date/time the old item is presented with? Is it the built time of the channel? Is it 'now'?
At the moment, I have removed all those ( in my eyes superfluous) mEntryDate handling from the RSSAtomParser.java but the one where it takes the pubDate of the item. Other EntryDate alternatives are discareded and mEntryDate will be null and the item is discarded (but I log the event if it happens). I rebuilt the app and run it in my phone. At the moment on first use it seems to function correctly, but I am curious what it does in time. I'll let you know.
Did you notice what date/time the old item is presented with? Is it the built time of the channel? Is it 'now'?
It is definitely not the channel update time, because in most (or maybe all) cases the latter is the same as entry's publication date. It is also not 'now' (or not always 'now'), it can be several hours, days or even months before now. I have no idea where can this date come from.
Well, it turned out that I was wrong. I stripped all the excessive handling of mEntryDate out of the RSSAtomParser, but today the error occurred again. A feed that had no new entries for quite a while, presented an item from about a month ago with the date of "24 May 2017". So the issue has to originate somewhere else.
My new guess is that the date time is not parsed correctly from the RSS entries. The RSSAtomParser uses the parser from SimpleDateFormat to see if a pattern might match the pubDate. Several patterns are given within the RSSAtomParser but the parser is set to 'lenient'. Maybe this can give unexpected result as a heuristic algoritm is used if the pattern is not correctly match in the first place. I do not know if this is the case, but for testing purposes I have set the lenient option to false and added some more precise patterns.
Also, it is my understanding that SimpleDateFormat is not synchronized. This means that concurrent access to the objects from different threads can give unexpected and strange results. Since @mikhirev reported a random date time and I noticed today also a seemingly random date time, this could be happening. I have tried to recode this part (as far as I understand synchronization) in order to avoid this problem.
I have installed the new test configuration on my phone. I will let you know what happens.
This looks more and more complex. The issue is not located in the parsing of the pubDate date/time. That is done correctly. When parsing the RSS feed, the entryDate of an item or entry is correctly retrieved.
However, some items are retrieved with wrong entryDates still. This even happens when the items are older than the time period we look back in the past for new articles. That should not happen because the RSSAtomParser filters entries that are too old. When entryDate is too old, it should not get to the part where it is stored in the database. Today I had an old item with wrong entryDate, but even this wrong entryDate was older than was allowed by the parser and still it got stored ! How is this possible?
I noticed that it only occurs if multiple feeds are refreshed simultanously. A possible explanation could be that the refreshing of one feed gets in the way of the refreshing of another feed and an item gets somehow wrongly stored in database. This is supported by the erratic behaviour of this issue. It happens sometime but certainly not all of the time.
Maybe the seperately running RSSAtomParsers for each feed share some resources, which get mixed up somehow At the moment I haven't got a clue where to look, but I will dig deeper. ;-) Anyone got an idea where to look next?
Update: It looks like the handler of the parser is indeed not thread safe. So the issue may be there. My knowledge in concurrent tasks is very limited, so if someone would like to help to look into this issue, it would be much appreciated.
Yeah! I got it fixed. :-) It is indeed a bug.
The parsing of the date in the RSSAtomParser.java class is not thread-safe. This means that when RSS feeds are refreshed simultaneously, the date time from one entry/item can be mixed up by the date time of another entry/item in another feed.
The solution is to make the PUBDATE_DATE_FORMATS
field thread-safe, so instead of one variable that all the threads access, each thread gets its local copy of the variable. To do this, replace the definition of PUBDATE_DATE_FORMATS
by
private static final ThreadLocal<DateFormat[]> PUBDATE_DATE_FORMATS
= new ThreadLocal<DateFormat[]>(){
@Override
protected DateFormat[] initialValue() {
return new DateFormat[] { // For RSS date time strings
new SimpleDateFormat("dd MMM yyyy HH:mm:ss", Locale.US),
new SimpleDateFormat("d MMM yyyy HH:mm:ss", Locale.US),
new SimpleDateFormat("d' 'MMM' 'yy' 'HH:mm:ss", Locale.US),
new SimpleDateFormat("d' 'MMM' 'yy' 'HH:mm:ss' 'Z", Locale.US),
new SimpleDateFormat("d' 'MMM' 'yy' 'HH:mm:ss' 'z", Locale.US)};
};
};
In the method parsePubdateDate()
the start of the loop looks slightly different. Change
for (DateFormat format : PUBDATE_DATE_FORMATS) {
to
for (DateFormat format : PUBDATE_DATE_FORMATS.get()) {
For more information see also this excellent post.