webpub-manifest icon indicating copy to clipboard operation
webpub-manifest copied to clipboard

Sorting keys are language dependent

Open qnga opened this issue 5 years ago • 15 comments
trafficstars

Currently RWPM supports sortAs in subjects, titles and contributors independently of their localized names. But sorting key is in fact language-dependent and should be supported as such.

I think, for example title, should be used as follows:

title: {
  "en": "Around the World in Eighty Days",
  "fr":  {
     "name": "Le Tour du monde en quatre-vingts jours",
     "sortAs": "Tour du monde en quatre-vingts jours"
  }
}

Or

title: {
  "name": "Le Tour du monde en quatre-vingts jours",
  "sortAs": "Tour du monde en quatre-vingts jours, Le"
}

Or

title: {
  "name": "Around the World in Eighty Days"
}

In Kotlin app, everything is ready for that. We have an object LocalizedString that contains objects Translation which may contain a sorting key besides canonical string.

qnga avatar Feb 04 '20 16:02 qnga

I understand your point but that's really not the usage for this.

Sorting keys are mostly used by reading apps to handle the following actions in a bookshelf:

  • order publications by title or author
  • filter publications for a specific author/subject/series

For this specific use case, having multiple sorting keys is more confusing than helpful.

When we look at EPUB files or OPDS feeds, it's already a miracle when they include both a sorting key and multiple translations. I don't think we can ever expect to get sort keys for each translation (I'm not even sure if that's doable with EPUB 3.x).

It's also worth pointing out that behind the scene, we're actually working with JSON-LD and not JSON.

The following example would not be proper JSON-LD since language maps in JSON-LD can only support literals and not objects with JSON-LD 1.1:

title: {
  "en": "Around the World in Eighty Days",
  "fr":  {
     "name": "Le Tour du monde en quatre-vingts jours",
     "sortAs": "Tour du monde en quatre-vingts jours"
  }
}

This limitation means that currently, we can't really support @direction either (see #33)

HadrienGardeur avatar Feb 04 '20 16:02 HadrienGardeur

Indeed, it seems to be few use cases. Maybe a bilingual edition? A more JSON-LD compliant alternative would be to make sortAs a language map, as title is now. The shortcut syntax

"sortAs"  = "Tour du monde en quatre-vingts jours, Le"

would still be able to be used.

qnga avatar Feb 04 '20 17:02 qnga

We could also align with W3C Publication manifest which uses an array of LocalizableString objects to support text direction.

qnga avatar Feb 04 '20 19:02 qnga

I'm a bit wary of revisiting this right now:

  • we can cover the most common use cases (multiple Japanese scripts, bilingual editions)
  • language maps provides us with a much nicer JSON syntax than the current W3C approach
  • I don't know how an app would actually implement multiple sorting keys

HadrienGardeur avatar Feb 04 '20 23:02 HadrienGardeur

This comes at a perfect time. I'm currently implementing such a system: image Where it would be nice to fit that data in a webpub. Since I hadn't gotten to the point of generating them yet, I hadn't even considered the fact that sortAs is a string only in the schema. Is there a way I could fit all this data in? Internally, the data is given like this:

[
    {
        "name": "The Combat Baker and Automaton Waitress",
        "sortAs": "Combat Baker and Automaton Waitress, The",
        "language": "en"
    },
    {
        "name": "戦うパン屋と機械じかけの看板娘",
        "sortAs": "タタカウパンヤトオートマタンウェイトレス",
        "language": "ja"
    }
]

I don't know how an app would actually implement multiple sorting keys

In my use case, the sorting key of the user's publisher or client language is used

chocolatkey avatar Feb 05 '20 04:02 chocolatkey

Very interesting! Could we know a little more about your use case? If the publication is monolingual, why do you wish to allow multiple languages for metadata?

qnga avatar Feb 05 '20 08:02 qnga

Thanks @chocolatkey for chiming in and proving me wrong regarding use cases 😉

Could you provide some additional context for this use case? It looks to me that you're trying to do the following:

  • this is a title where both the English and Japanese names of the publication are available
  • for the Japanese title, the "real" title has a mix of Kanji, Hiragana and Katakana
  • the sorting key for the Japanese title translates the real title to Katakana only

If we move sortAs away from being a literal and add the same language map approach that we use for title and name, this could be represented as:

{
  "title": {
    "en": "The Combat Baker and Automaton Waitress",
    "ja": "戦うパン屋と機械じかけの看板娘"
  },
  "sortAs": {
    "en": "Combat Baker and Automaton Waitress, The",
    "ja": "タタカウパンヤトオートマタンウェイトレス"
  }
}

HadrienGardeur avatar Feb 05 '20 10:02 HadrienGardeur

Very interesting! Could we know a little more about your use case? If the publication is monolingual, why do you wish to allow multiple languages for metadata?

In the system I am creating, the publishers (of translated Japanese doujin content) are going to have the ability to privately share a review copy of the publications with the original authors, and potentially have a mini-library for them. The original authors are usually not well-versed in English, so the original title needs to be present so the original and localized title can be displayed side-by-side. The katakana is included there for filtering and sorting purposes for when the titles are displayed in Japanese, both in that private frontend as well as the admin backend.

If we move sortAs away from being a literal and add the same language map approach that we use for title and name [...]

I think this would be a good idea, because it's backwards-compatible with the existing schema. My cases tend to be edge cases (which is probably good for probing at the limits of standard), but most people will just need "sortAs": "Single String, The". The way @HadrienGardeur represented my data in the example snippet is perfect.

chocolatkey avatar Feb 05 '20 19:02 chocolatkey

OK then let's vote through this issue using 👍and 👎on this message. I'll also bring it up in our weekly call.

Who's in favor of turning sortAs into a language map?

HadrienGardeur avatar Feb 08 '20 10:02 HadrienGardeur

I drafted a proposal: https://github.com/qnga/webpub-manifest/blob/proposal/sortAs/proposals/001-multilingual-sortAs.md

@chocolatkey Would you add something more precise about your use case?

Here is an internal PR for suggestions and comments: https://github.com/qnga/webpub-manifest/pull/1

qnga avatar Apr 15 '20 11:04 qnga

@qnga what additional information would you like about my use case besides what I said previously? I can't think of much I didn't say

chocolatkey avatar Apr 16 '20 05:04 chocolatkey

I was suggesting you may explain it right in the proposal. But it might not be necessary.

qnga avatar Apr 16 '20 07:04 qnga

@qnga aha now it's clear. Would you like me to fork your fork and submit a PR or comment in your internal PR?

chocolatkey avatar Apr 16 '20 09:04 chocolatkey

Note that the Go implementation of a Publication already has a "MultiLanguage" struct which is currently applied to the title and subtitle properties and could easily be applied to sortAs as well. Therefore the move is not hurtful for the Go code.

llemeurfr avatar Apr 16 '20 10:04 llemeurfr

The easiest way is adding suggestion snippets in comments of the PR https://github.com/qnga/webpub-manifest/pull/1

Thanks Laurent for the feedback about the Go implementation.

qnga avatar Apr 16 '20 12:04 qnga