human-headers icon indicating copy to clipboard operation
human-headers copied to clipboard

instead of `X-` headers, use link relations/linked data

Open daxim opened this issue 11 years ago • 2 comments

X- headers are the wrong solution to this kind of problem. They are not interoperable in the sense that you do not get to authoritatively define the semantics. Instead you need to convince the rest of the whole world to join your private club of X-Human-* implementors, which is unlikely to happen.

Here's how you do it properly on the Web. Relations between entities are expressed by typed hyperlinks. Entities are addressed with URIs. The type of the hyperlink is an IANA registered string or a URI. Link relations can be expressed in hypertext documents (e.g. a, link, area elements with rel attribute in HTML; XLink in XML; _links property in HAL+JSON) or as an RFC 5988 HTTP Link header.

Example: you have just made http://example.net/something and want to express this. Add the HTTP headers

Link: <mailto:[email protected]>; rel="author"
Link: <http://randomdrake.com/hello-there/#me>; rel="author"

If "something" happens to be a hypertext document (here, HTML), you can also inline links:

<head>
    <link href="mailto:[email protected]" rel="author" />
    <link href="http://randomdrake.com/hello-there/#me" rel="author" />

or perhaps

<footer>
    <p>Made with lots of love by <address>
        <a href="http://randomdrake.com/hello-there/#me" rel="author">
            randomdrake
        </a></address>.
    </p>

Dereferencing that second hyperlink returns some semantically rich hypertext document, e.g. HTML5 with RDFa lite:

<section vocab="http://schema.org/" typeof="Person">
    My name is <span property="name">David Drake</span>
    and for purposes of illustration only, claim to
    <a property="worksFor" href="http://example.com/">
    work for <span typeof="Corporation" property="name">
    Acme Widgets, Inc.</span></a>.
    ⋮
    <ul id="social-links" property="account">
        <li vocab="http://rdfs.org/sioc/ns#" typeof="UserAccount"><a href="https://twitter.com/randomdrake" title="Follow Me on Twitter">
            <i class="icon-twitter"><span property="accountName">@randomdrake</span></i>
        </a></li>
        <li vocab="http://rdfs.org/sioc/ns#" typeof="UserAccount"><a href="https://github.com/randomdrake" title="Check Out My Github">
            <i class="icon-github"><span property="accountName">randomdrake</span></i>
        </a></li>
    </ul>

That's the idea of linked data. At Acme's homepage, they might express their intent with e.g.

<a rel="http://nazou.fiit.stuba.sk/nazou/ontologies/v0.6.17/offer-job"
    href="http://example.com/hiring">We're hiring!</a>

As you see, the semantics of those four pieces metadata in your proposal are already expressed in existing vocabularies, backed by numerous solid Internet and Web standards and there's already a huge ecosystem of readily working software that can process these data. I think X-Human-* should be killed off in favour of the above.

daxim avatar Feb 07 '14 01:02 daxim

Thanks very much for the writeup.

I like the direction that you are taking with this. Over in issue #1 we had a good discussion as to why it was a good/bad idea/implementation.

Ultimately, it seems that a combination of embracing the work already done for http://humanstxt.org/, since it's a defined and already adopted standard, along with a single header as a reference to the resource may be the best course of action.

Using your suggestions and the results of that discussion, it seems the best of both worlds, currently, would be to have a single header of Link: http://example.com/humans.txt; rel="author"

Provided with a browser extension to look for that header, grab that file, and parse it for display on the fly, sounds like it may achieve the "delight" factor of stumbling across creators leaving their mark.

Would you mind commenting on whether you think that is a viable solution that fits in with the standards your referenced and proposed?

randomdrake avatar Feb 13 '14 01:02 randomdrake

http://humanstxt.org/ [is] a defined […] standard

I had a good look at it and must dissent. There is nothing that reaches even the lower bounds of rigorosity of what passes for an internet standard, the people responsible apparently never read a good book on information design or participated in a relevant standards body (e.g. IETF, W3C) mailing list where experienced engineers dish out free advice how to write a proper standard.

This means that unfortunately yonder Standard.html document is a complete ad-hoc description, there isn't even a complete example, just a vague skeleton. Anything goes, because, hey, as the humanstxt people say, it's for humans only. But they're missing the point, years ago machine traffic has surpassed human traffic on the Web, and so developing formats that can be consumed by both humans and machines (HTML) are the big success. Excluding machines on purpose is just a mindboggling bad idea.

grab that file, and parse it for display

… which makes that plan is a hopeless undertaking, because a parser essentially falls into the category of machines. You might look at humans.txt examples in the wild and derive an ad-hoc parser from them. However, like Sysiphus, you will be forever working to catch up with Website authors who deploy humans.txt and come up with new extensions for it, and modify your parser to extract that new information, yet never achieve full fidelity because it's a moving target as you cannot be aware of all the humans.txt deployments on the Web.

This is a plain consequence of the humanstxt people from choosing and an implementing a basically unrestricted ad-hoc format. Anything Website author writes into humans.txt is equally valid, and with that I do not only mean intentional extensions. Consider what happens when a humans.txt author typoes a field name, e.g. Twiter:. He won't notice that simple mistake, and consequently your parser will fail or misparse.

Real standards have validators, and are designed in such a way that they can have validators and thus enable interoperability. My specific critique of humanstxt is that there is no formal definition (e.g. ABNF grammar), no description of a failure mode (what should a UA do when it cannot parse/needs to deambiguate?), no content type (how does a UA recognise the format?), and there's been no discussion about it in the Web community about the usual unforeseen cross-cutting concerns like security, i18n, a11y and registering an RFC 5785 location.

This makes it bad enough for me to not bother using it, but you're likely looking for a solution: bypass the bad standard as much as possible; encourage authors to use good standards. The goal is to put pressure on humanstxt to evolve into something decent, or failing that to displace it. Website authors who wish to credit the humans involved can provide multiple documents with the "author" relationship:

Link: <http://example.com/humans.txt>;
    rel="author";
    type="text/plain; profile=\"http://humanstxt.org/Standard.html\""
Link: <http://example.com/site-contributors.html>;
    rel="author"; type="application/xhtml+xml"

The same expressed in HTML:

<link
    href="http://example.com/humans.txt"
    rel="author"
    type='text/plain; profile="http://humanstxt.org/Standard.html"' />
<link
    href="http://example.com/site-contributors.html"
    rel="author" type="application/xhtml+xml" />

When authors have control over the server configuration, it's possible to simplify this to just one resource which has multiple representations, and the UA initiates negotiation on the content type:

Link: <http://example.com/humans>; rel="author"


GET http://example.com/humans HTTP/1.1
Accept: text/html, application/xhtml+xml; q=0.9, text/plain; q=0.6

HTTP/1.1 200 OK
Content-Type: application/xhtml+xml
Vary: Accept

The document itself can be structured in a much greater variety compared with humanstxt's semi-structured lists of tagged paragraphs. As I've shown in the example above in my first post, HTML allows to write in prose resulting in better readability for humans, and machines can still make sense of it as long as the relevant data are properly marked up. This is pretty cool, lots of expressivity and power, and you as the UA programmer don't even need to do much programming because there are already libraries for HTML and RDFa for every language under the sun. Investigate these vocabularies that are useful to express most of the semantics of humanstxt: Schema.org for a person (including job title/nationality), SIOC for user accounts (e.g. Twitter username), WGS84 for geolocation.

PS: Be aware of prior art for your browser extension. In Opera and Seamonkey, link relation navigation bars are built-in (screenshot, screenshot). For Firefox, there's http://www.bolwin.com/software/snb.shtml. These navigation bars work for any link relation, not just "author".

daxim avatar Feb 13 '14 04:02 daxim