activitystreams icon indicating copy to clipboard operation
activitystreams copied to clipboard

Activity Streams should allow to state activities should not be tracked (robots.txt)

Open akihikodaki opened this issue 8 years ago • 3 comments

Please Indicate One:

  • [ ] Editorial
  • [ ] Question
  • [ ] Feedback
  • [ ] Blocking Issue
  • [X] Non-Blocking Issue

Please Describe the Issue: Mastodon implemented a feature to set up robots meta tag for HTML representations of objects (https://github.com/tootsuite/mastodon/issues/1599). That controls behaviors of robots on the Web. However, it is also an ActivityPub application, and robots could exist in the federation. Those bots could not understand such intention.

Activity Streams should allow to state that activities should not be tracked by robots to solve the issue. My suggestion is to extend Activity Vocabulary by adding robots property to the object. The value could be same or similar to the content of robots meta tag of HTML.

akihikodaki avatar Aug 19 '17 11:08 akihikodaki

This has been discussed before in the ActivityPub issue tracker. I believe https://github.com/w3c/activitypub/issues/221#issuecomment-300205759 represents the consensus of the working group, although it could be just Evan. Either way I suspect his answer will be identical here.

strugee avatar Aug 19 '17 17:08 strugee

Sorry, I missed the issue. That is exactly the problem I want to address. However I have some arguments to support this idea rather than using audience, and because of that, I thought Activity Streams rather than ActivityPub should be extended and opened this issue.

  1. audience could not represent partial restrictions of robots meta tag and robots.txt.

The standard shows the following restrictions:

  • noindex in meta tag: the page should not be indexed.
  • nofollow in meta tag: the links in the page should not be followed.
  • Disallow in robots.txt: the content of the page should not be scraped.

They are different restrictions, and the page administrator can show partial restrictions by choosing directives to include in the meta tag or robots.txt. For example, only noindex means robots can follow links in the page. That is exactly what Mastodon does. (see https://github.com/tootsuite/mastodon/pull/4199.) In such cases, robots are still in audience of the page.

  1. Compatibility with robots meta tag

We can have better compatibility by having robots property with similar content to robots meta tag. Compatibility matters because Activity Streams applications could often be Web applications as well.

  1. robots is suited for the standard while audience is more dependent on implementations.

Activity Streams does not define the content of audience, and it could be more dependent on implementations. However, robots property could be a standard as robots.txt is a de facto standard.

akihikodaki avatar Aug 20 '17 02:08 akihikodaki

This is a cool idea, but donno if it should be a long-standing open issue here.

If I were you and still need this, I'd write a short document explaining this (copy-paste?) and host it as https://mastodon.social/activitystreams-extensions/robots .

Anyone can then add 'robots' to their JSON objects by defining it in the @context.

gobengo avatar Jan 25 '18 00:01 gobengo

This is an interesting idea. It's also an area of a lot of conversation in the fediverse. It's not currently part of AS2, so it would need to be an extension. That's something well-documented in the AS2 core document:

https://www.w3.org/TR/activitystreams-core/#extensibility

We do have a list of well-known extensions, so if this is widely used, we should probably include it.

For now, I'm going to close this issue, with the recommendation that a new extension vocabulary be added.

evanp avatar Apr 19 '23 16:04 evanp