activitystreams icon indicating copy to clipboard operation
activitystreams copied to clipboard

Spec does not clarify non-functional natural language values when mapped

Open cjslep opened this issue 7 years ago • 6 comments

Please Indicate One:

  • [ ] Editorial
  • [ ] Question
  • [X] Feedback
  • [X] Blocking Issue
  • [ ] Non-Blocking Issue

Please Describe the Issue:

In https://www.w3.org/TR/activitystreams-core/#naturalLanguageValues the language mapped forms are examplified:

Accordingly, in the JSON serialization, the terms " name", "summary", and "content" represent the JSON string forms; and the terms " nameMap", "summaryMap", and " contentMap" for represent the object forms.

An example provided is Example 22:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "nameMap": {
    "en": "This is the title",
    "fr": "C'est le titre",
    "es": "Este es el título"
  }
}

However, according to https://www.w3.org/TR/activitystreams-vocabulary/#properties none of the above properties are marked 'functional': name, summary, and content. Thus, having multiple values for these properties is valid.

Therefore, the following message is within spec:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "name": [ "This is the title", "This is another title" ]
}

However, the spec does not describe how this should be handled in map form, if at all. Two options that would handle it include:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "nameMap": {
    "en": [ "This is the title", "This is another title" ],
    "fr": [ "C'est le titre", "C'est un autre titre" ],
    "es": [ "Este es el título", "Este es otro título" ]
  }
}

and

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Object",
  "nameMap": [
    {
      "en": "This is the title",
      "fr": "C'est le titre",
      "es": "Este es el título"
    },
    {
      "en": "This is another title",
      "fr": "C'est un autre titre",
      "es": "Este es otro título"
    }
  ]
}

And another implementation could ignore these altogether as being "unhandled", and all three could be able to claim to follow the spec due to the lack of guidance.

cjslep avatar Jan 02 '18 19:01 cjslep

[ edit, became clear by 437 comment ] I see, referred to this document https://www.w3.org/TR/activitystreams-vocabulary/#dfn-content

  • content and contentMap and the others should maybe form two different table rows (I'd prefer one row per property)

  • and in the document https://www.w3.org/TR/activitystreams-vocabulary/ all findings of "multiple language-tagged values" could be linked with Section 4.7 of the ActivityStreams Core

When I was initially reading the vocabulary document I was also unaware that e.g. content and contentMap are mutually exclusive and that there is a special "und" property for unavailable languages.

sebilasse avatar Jan 17 '18 14:01 sebilasse

@cjslep please also note for content / contentMap - it gets worse :

How should mediaType behave with multiple content ? mediaType is marked functional ! https://www.w3.org/TR/activitystreams-vocabulary/#dfn-mediatype

This does not even allow me to mix e.g. html and markdown content …

sebilasse avatar Jan 19 '18 08:01 sebilasse

@sebilasse

How should mediaType behave with multiple content ?

Is this what you mean?

{
  "type": "Note",
  "mediaType": "text/plain",
  "content": [
    "<!doctype html>some html",
    "{}",
  ]
}

How do we interpret the multiple values of content?

I do believe this is a good bug. I think the quickest thing we could do is at least to add to that description of 'mediaType'.

If `content` or `contentType` have multiple values, then the meaning of a single `mediaType` value is undefined.

Separately...

  • I think the best solution to this is to add a new Type to the range of the content property. Allow values like this:
{
"type": "StringContent",
"mediaType": "application/json",
"string": "{}"
}

or, honestly, just allow Links in the range of Content, so allow

{
"type": "Link",
"href": "data:application/json;charset=utf-8;base64,e30=",
}

And then deprecate 'mediaType' on Objects.

Could use editorial feedback @cwebber

gobengo avatar Jan 24 '18 23:01 gobengo

@gobengo This is exactly what I meant.

I would go for the "best solution" 😁 If multiple content items are provided, each one should have it's own content encoding and media type.

See e.g. how e.g. JSON Schema spec. deals with it http://json-schema.org/latest/json-schema-validation.html#rfc.section.8.3

The default could be

{
    "content": "foo",
    "encoding": "8bit",
    "mediaType": "text/html"
}

but it could also be an image

{
    "content": "bar",
    "encoding": "base64",
    "mediaType": "image/png"
}

where contentEncoding can be RFC 2045 "7bit" | "8bit" | "binary" | "quoted-printable" | "base64" | ietf-token | x-token


@cjslep fyi: Made JSON Schemas https://github.com/redaktor/ActivityPubSchema

sebilasse avatar Jan 25 '18 10:01 sebilasse

I think we want the same thing. I think there's work to be done to clarify what the 'range' of 'content' should be. Object is probably fine, but might be weirdly broad. Perhaps an extension should define a Content type and related StringContent. Or take a look at oa:TextualBody

gobengo avatar Jan 25 '18 18:01 gobengo

The Vocabulary document does not specify that these properties are "functional", but it does refer to the properties in the singular as part of the definitions. For example,

  • content: "The content or textual representation of the Object"
  • name: "A simple, human-readable, plain-text name for the object."
  • summary: "A natural language summarization of the object encoded as HTML."

None of the examples have multiple values for these properties, and there is no guidance on how consumers should handle multiple values here.

I think the resolution for this problem is to include the Functional flag for these properties in the ERRATA, and to document a best practice for dealing with multiple values if found.

evanp avatar Jun 21 '23 16:06 evanp