FeedEntryMessageSource fails if entry date fields are null [INT-1810]
David Turanski opened INT-1810 and commented
feed:inbound-channel-adapter outputs nothing with attached feed content from http://feeds.feedburner.com/NF-NewestInstantTitles (NetFlix). The entries are not added because the entries do not include published date or entry date. I'm a newb when it comes to rss, but the bug appears to be in FeedEntryMessageSource.populateEntryList().
Affects: 2.0.3
Attachments:
- netflix-feed.xml (38.27 kB)
Oleg Zhurakousky commented
I am changing it to the improvement as this was discussed durting the initial implementation and we decided to live with it for now. The real issue is how do we distinguish from the entry that was already read vs the new/updated one
Oleg Zhurakousky commented
Well, this one is tough. need to think about it some more
The core fo the issue is that MetadataStore works based on remembering only the latest component retrieved based on its create Date.
if ((entryDate != null && entryDate.getTime() > this.lastTime) {
// save the entry
}
The last.time comes from MetadataStore. Disabling it will cause another issue and that is retrieval of the same data on each poll - DUPLICATES which is what the MetadataStore was supposed to solve.
David Turanski commented
It's not as bad as that. AFAICT an RSS feed will return http status 304 Not Modified with an empty message body unless the content has changed. I observed this w/ TCP Monitor on a couple of feed URLs I tested. If the feed is updated, you get all entries not just the deltas. Maybe provide an option to disable the filter?
Oleg Zhurakousky commented
I have to see how we are calling it and if I have access to the status, since while trying with the URL you provided here, it was giving me the same results on every poll. Also, if that works it will only provide a partial solution, since as soon as update happens we'll have to deal with everything again.
Oleg Zhurakousky commented
Moving it to 2.1. Need to do a bit more thinking. Not as simple as it sounds based on how we track what's been read. Most likely need enhancements to MetadataStore strategy
Oleg Zhurakousky commented
Moving it to 2.2. Not sure how can we address this with current infrastructure.
Artem Bilan commented
Well, I'd say the best solution here would be SyndEntryDateStrategy :
public interface SyndEntryDateStrategy {
Date entryDate(SyndEntry entry, SyndFeed feed);
}
To allow end-user to have full control over those "dateless" feeds.
Artem Bilan commented
Similar problem: http://stackoverflow.com/questions/36859724/how-to-parse-rss-feeds-with-spring-integration-when-pubdate-not-available
Artem Bilan commented
One more use-case: https://stackoverflow.com/questions/44435815/spring-integration-feed-inbound-channel-adapter-duplicate-entries