rss-parser
rss-parser copied to clipboard
entry doesn't have guid field
Hi when I parse https://reddit.com/.rss
I get the below output.
{ title: 'Rest of Canada relieved they no longer have to cheer for Boston',
link: 'https://www.reddit.com/r/hockey/comments/bgv2hu/rest_of_canada_relieved_they_no_longer_have_to/',
pubDate: '2019-04-24T14:09:55.000Z',
author: '/u/pubwash',
content: '  submitted by   <a href="https://www.reddit.com/user/pubwash"> /u/pubwash </a>   to   <a href="https://www.reddit.com/r/hockey/"> r/hockey </a> <br/> <span><a href="https://www.thebeaverton.com/2019/04/rest-of-canada-relieved-they-no-longer-have-to-cheer-for-boston/">[link]</a></span>   <span><a href="https://www.reddit.com/r/hockey/comments/bgv2hu/rest_of_canada_relieved_they_no_longer_have_to/">[comments]</a></span>',
contentSnippet: 'submitted by /u/pubwash to r/hockey [link] [comments]',
id: 't3_bgv2hu',
isoDate: '2019-04-24T14:09:55.000Z' }
The entry has the id
field but your readme and https://github.com/bobby-brennan/rss-parser/blob/master/test/output/reddit.json show a guid
field.
Parsing http://rss.cnn.com/rss/cnn_topstories.rss
on the other hand returns guid
and no id
.
{ title: 'A music scholar\'s take on Beyonce\'s latest',
link: 'http://rss.cnn.com/~r/rss/cnn_topstories/~3/jHWNTTYSLlw/index.html',
pubDate: 'Mon, 22 Apr 2019 08:23:54 GMT',
content: 'On April 9, 1939 the Afric...=""/>',
contentSnippet: 'On April 9, 1939 the African-American opera singer Marian Anderson made history when she performed outdoors on the National Mall in Washington.',
guid: 'https://www.cnn.com/style/article/beyonce-homecoming-opera/index.html',
isoDate: '2019-04-22T08:23:54.000Z' }
Reddit has switched to the atom format since these docs were generated.
Maybe we should always populate both id
and guid
, using whichever is available
Makes sense. I just do it in my code right now id = item.guid || item.id