sockethub icon indicating copy to clipboard operation
sockethub copied to clipboard

Fix Feeds results, with tests

Open silverbucket opened this issue 3 years ago • 2 comments

This makes sure the feed fetching results, which is an array of activity objects, is passed back to the client. Includes integration tests to make sure any breakage is caught in the future.

silverbucket avatar Aug 28 '22 22:08 silverbucket

@raucao this is ready for review, you already did the heavy lifting in your first review, just need a last check and approval.

silverbucket avatar Sep 23 '22 16:09 silverbucket

... btw, we could also merge as is and address the functional questions in new PRs later.

raucao avatar Sep 24 '22 13:09 raucao

@raucao hey, for the html vs text fields, what do you think about stripping the html from the text fields? If the source text is not html, they'll both be plain text, but if the source is html, then there will be the text alternative.

As for the date, what do you suggest? that there be published property at the root of each activity stream (eg. one property for each AS in the array) feed[2].published. I guess that would be the most uniform as that's how we do it for XMPP yeah? I'm just wondering if we want to differentiate between the date of the post, vs the date the post was fetched, not sure.

silverbucket avatar Oct 01 '22 16:10 silverbucket

@raucao hey, for the html vs text fields, what do you think about stripping the html from the text fields? If the source text is not html, they'll both be plain text, but if the source is html, then there will be the text alternative.

I just had a quick look at the RSS and ATOM specs. In RSS, encoded HTML is allowed in description, but it's up to the client to parse. In ATOM, text elements like summary and content should have a type attribute that specifies if it's HTML or plaint text, or even something else.

I'm not sure if the Sockethub platform is the right place to implement content formats and conversions to be honest. But it should always send all possible information to the client, of course. It would certainly make client implementation more convenient if at least the content type information would exist for every item, but I'm not sure doubling or even quadrupling the content payload is the way to go.

So I would maybe propose to only send a summary/brief value, if it's actually different from the description or content (i.e. the feed is an ATOM feed and content is different from summary), and I would probably not send _text and _html, but just content plus content_type if it's available.

As for the date, what do you suggest? that there be published property at the root of each activity stream (eg. one property for each AS in the array) feed[2].published. I guess that would be the most uniform as that's how we do it for XMPP yeah? I'm just wondering if we want to differentiate between the date of the post, vs the date the post was fetched, not sure.

The client knows when it fetched the post, but there are two possible dates from the feed that it may want: the date when the item was originally published (I believe that's what published is for in AS objects), and the date that the item was last updated. The latter is actually the only required date element for ATOM feed entries, while published is optional, and pubDate is optional in RSS, too.

raucao avatar Oct 02 '22 19:10 raucao

@raucao OK I think I've addressed both of your comments regarding duplicate payload (i now also provide contentType) and the date property naming to published. Let me know what you think.

silverbucket avatar Oct 07 '22 12:10 silverbucket

@raucao OK, integration tests have been fixed!

silverbucket avatar Oct 07 '22 15:10 silverbucket

Just FYI - the compliance is all passing, it's just that I consolidated the jobs test, lint, coverage into one compliance to reduce noise. the last time these ran was before compliance existed so it thinks they are missing.

silverbucket avatar Oct 11 '22 15:10 silverbucket

I'm wondering what status: true is:

Screenshot from 2022-10-12 12-05-30

raucao avatar Oct 12 '22 10:10 raucao

@raucao nice catch, none of the feeds I tested with had empty descriptions. It was actually a small fix. As for name vs title, the reason I went with name is for object consistency, but agree title probably makes more sense. I've switched it out. Eventually I'll have strict schema definitions for these things but that's out of scope of this ticket.

Ready again for review :)

silverbucket avatar Oct 12 '22 10:10 silverbucket

One more thing: I don't think there should be a value for updated when the source feed does not contain that property.

raucao avatar Oct 12 '22 10:10 raucao

One more thing: I don't think there should be a value for updated when the source feed does not contain that property.

The feedparser API doesn't really distinguish, the date property is always either the same as pubDate or more recent (e.g. updated). So I've added a comparison, if they are the same it's set to undefined (and therefore not passed along).

silverbucket avatar Oct 12 '22 10:10 silverbucket