webpub-manifest Identify proprietary DRM schemes

While we're already capable of identifying LCP using the EPUB profile and its scheme element, there are other DRM schemes out there which we're currently not capable of identifying.

A user could import an EPUB file that's protected by other schemes such as:

Adobe ACS
Kobo DRM
Google DRM
Apple DRM

IMO we should be able to identify all these schemes in RWPM since they could prohibit users from accessing their files. Without this info, Readium Mobile and Desktop could return an unknown error (or worse) which is never ideal.

For each of these DRM schemes we should document:

how to identify them (URI)
and how to detect them (this probably belongs more in architecture than here)

This information should be added to an appendix of the EPUB Profile.

Any thoughts on this @llemeurfr @danielweck @qnga @mickael-menu ?

Aug 12 '20 09:08 HadrienGardeur

This sounds like a better user experience for sure. That would also solve our dilemma about when to resolve the scheme between the EPUB Parser and Content Protection, if we consider that it's the responsibility of the EPUB parser to detect as many DRM as possible.

Aug 12 '20 12:08 mickael-menu

I think attempting to support everything is always challenging and each DRM seems to be very specific. To implement ACS support I had to write a custom encryption.xml parser to retrieve non-standard data required by the ACS Connector. Do we want the EPUB parser and the Encryption model to take care of this? I don't think so.

On the contrary, I was going to suggest to remove scheme and profile properties because they are LCP-specific, not link-specific (as far as I know) and, as Mickaël said, we didn't know when and how to fit them. Or at least, we may not use them in non-native RWPM.

Actually, there is another way to return clever errors to users. Readium allows only one ContentProtection by publication, therefore this unique one may be responsible for checking no resource is encrypted with an unknown algorithm. If not so, Resource.read would return a Forbidden error with the algorithm URI. In case where no ContentProtection is built on parsing, we might use a fake one that would check no resource is encrypted, and if not so may attempt to guess the DRM that is used.

I may have drifted away from the initial topic because I cannot completely figure out what the issue is.

Aug 12 '20 13:08 qnga

I'm convinced from the start that having schema and profile at the level of each encrypted resource is non useful, and this information should be held at the level of the global manifest metadata.

If we go the route of having a "drm" / "protectedBy" property with values like "lcp" "acs" ..., the value currently in "scheme" would go there, along with a sub-property "profile" for dealing with the lcp case today, maybe other types of drm tomorrow.

"drm": { "scheme":"http://readium.org/2014/01/lcp", "profile":"http://readium.org/lcp/basic-profile"}
"drm": { "scheme":"http://adobe.com/drm/acs"}

Now, apart from lcp and acs, we won't find in the wild different types of drm protected ebooks, will we? Kobo, Apple, Google, Amazon protected ebooks are not exported easily from their walled gardens.

Aug 12 '20 13:08 llemeurfr

Do we want the EPUB parser and the Encryption model to take care of this?

We have zero plan for supporting these DRMs on the decryption side, but IMO there's a real use case in being able to identify them and provide better error handling.

On the contrary, I was going to suggest to remove scheme and profile properties because they are LCP-specific, not link-specific (as far as I know) and, as Mickaël said, we didn't know when and how to fit them. Or at least, we may not use them in non-native RWPM.

That's incorrrect, scheme identifies a DRM and is therefore not LCP specific. Same comment for profiles, which are widely used across many different serialization formats and media types.

I'm convinced from the start that having schema and profile at the level of each encrypted resource is non useful, and this information should be held at the level of the global manifest metadata.

That's not the same level of information. Your average LCP protected file will have:

fonts obfuscated with the IDPF or Adobe algorithms
LCP protected resources (most of the content)
non-protected resources (cover, table of contents)

We need to have that information at a resource level, which is consistent with how this information is expressed in EPUB as well (albeit poorly expressed in the case of EPUB).

While I'm certainly in favor of a helper method at a publication level that will process resource-level information and return a value for the publication, this is quite different from adopting a lossy information model like the one you're suggesting.

Aug 12 '20 13:08 HadrienGardeur

To implement ACS support I had to write a custom encryption.xml parser to retrieve non-standard data required by the ACS Connector. Do we want the EPUB parser and the Encryption model to take care of this? I don't think so.

I think the proposal is not meant to replace any custom parsing from the DRMs (we still need to parse the LCPL for example with LCP). It's more to give an information about which DRM is used, for the well-known ones.

Aug 12 '20 13:08 mickael-menu

I think the proposal is not meant to replace any custom parsing from the DRMs (we still need to parse the LCPL for example with LCP). It's more to give an information about which DRM is used, for the well-known ones.

Correct. I see this as a DRM counter-part to the work that has been done to better identify formats in our SDK.

Aug 12 '20 13:08 HadrienGardeur

Although I think this is doable for the scheme, I'm still not sure that the EPUBParser should fill the profile. For example with LCP it means parsing the license.lcpl.

Aug 12 '20 14:08 mickael-menu

We need to have that information at a resource level, which is consistent with how this information is expressed in EPUB as well (albeit poorly expressed in the case of EPUB).

You're talking about the algorithm property. The fact remains that scheme and profile are not link-specific. We can imagine that encryption.algorithm would say whether the link is encrypted or obfuscated, and a more global information would talk about encryption scheme.

If scheme and profile are widely used, I think giving the responsibility to fit them to the Epub Parser would go against the extensibility provided by the ContentProtection API.

I'd like to see a more concrete specification of better error handling provided by an information about DRM right inside RWPM. It seems to me that the ContentProtection API already fits the need.

Aug 12 '20 14:08 qnga

(obfuscated, encrypted, non-encrypted) We need to have that information at a resource level

Isn't the algorithm property giving this information?

and a more global information would talk about encryption scheme.

Just a precision: it is not an encryption scheme, the algorithm says all about the encryption of the resource. It is a drm scheme, i.e. the way to find the decryption key.

Aug 12 '20 14:08 llemeurfr

Isn't the algorithm property giving this information?

It only provides partial information and answers two specific questions:

is it encrypted?
if so, which algorithm is used?

You need to know the scheme and the profile as well to have the whole picture and answer the final question:

how can I obtain the key necessary to decrypt this ressource ?

Just a precision: it is not an encryption scheme, the algorithm says all about the encryption of the resource. It is a drm scheme, i.e. the way to find the decryption key.

In most cases that's correct. But one could imagine use cases that are not DRM related.

For example a stronger approach to protect fonts, without going full DRM on it.

Aug 12 '20 15:08 HadrienGardeur

Let me remind the existence of the ContentProtectionService, designed to be exposed both as a native API and served over HTTP.

Aug 12 '20 17:08 qnga

webpub-manifest webpub-manifest copied to clipboard

Identify proprietary DRM schemes

webpub-manifest
webpub-manifest copied to clipboard