granary icon indicating copy to clipboard operation
granary copied to clipboard

Figure out mf2 h-feed authorship

Open alexmingoia opened this issue 5 years ago • 5 comments

Source: HTML Target: Atom/XML Example: https://granary.io/url?input=html&output=atom&url=https://news.indieweb.org/en Expected feed author: IndieNews en @ https://news.indieweb.org/en Actual feed author: The first h-card on the page Note: The feed id and title is correct, but the <author> element is not.

Suggested solution: Granary should follow the representative-h-card-parsing algorithm, and if no h-card is found then use <title> and page URL as the author, instead of incorrectly assuming the first h-card is the page's author.

alexmingoia avatar Apr 06 '20 07:04 alexmingoia

thanks for filing! granary currently uses the authorship algorithm to find the feed author, but that's evidently for posts, not feeds. so i guess you're right, maybe i should use the h-feed's p-author mf2 property first, and if not provided, fall back to representative h-card.

snarfed avatar Apr 06 '20 21:04 snarfed

lots more discussion on this recently on #indieweb-dev and on #microformats, but no conclusion. basically, we don't yet have an "authoritative" way to determine an h-feed's author, at least if it doesn't have an explicit p-author property. representative h-card and authorship algorithm are both related, but neither is the exact answer.

@tantek's comments here are perhaps the closest thing to a conclusion: basically, we still need to do some research and come up with an algorithm. we don't necessarily have the "right" one just yet.

snarfed, h-feed authorship is an interesting problem and worth researching & brainstorming properly rather than seeing if h-entry approaches “just work” because that may be overdoing it Better to collect examples (links, analysis) of h-feed elements that you’re trying to parse and analyze them to figure out a minimum algorithm based on examples The “XML approach” would be to assume / require authors/publishers always use an author property and then “just” look for that. While a good starting point, it’s obviously a bad approach to optimize for developer convenience rather than researching reasonable real world examples and making sure to handle them It’s also a bad approach to “just try” some other similar algorithm to see if it “just works” as you’re likely making all sorts o bad assumptions by doing so So I disagree with both “just use representative h-card” and “just use h-entry authorship but for h-feed” There’s no shortcut here. If you want a good algorithm it has to start with documenting & analyzing real world publishing examples

snarfed avatar Apr 10 '20 17:04 snarfed

i'm not necessarily going to take on researching and creating this new h-feed authorship algorithm, but i will take two todos here:

snarfed avatar Apr 10 '20 17:04 snarfed

Here is the algorithm I am using to parse feed author in the wild, quoted from indieweb/authorship/issues/4:

  1. If h-feed with p-author, author is p-author.
  2. If h-feed with u-url, and that URL has h-card matching u-url, author is that h-card.
  3. If h-feed with u-url, and that URL has no h-card matching u-url, author URL is u-url and name is page .
  4. If h-feed with no u-url or p-author, author URL is page URL and name is page .
  5. If no h-feed then no feed author.

This would at least fix the example feed parsing for this issue, setting the author to be "IndieNews en @ news.indieweb.org/en"

alexmingoia avatar Apr 11 '20 03:04 alexmingoia

i've taken a stab at this in 8e190da85b053c0bff287cc237806d880233ef40, but it's an ugly refactoring and nowhere near usable yet, and i don't see a clear path to get it merged. open to other thoughts or attempts!

snarfed avatar Apr 19 '20 21:04 snarfed