pwpub
pwpub copied to clipboard
What Packaging Format/Style Should a PWP Use?
PWP will require the selection of some sort of packaging format in order to be a Packaged Web Publication.
Some options currently under consideration include, but are not limited to:
- Web Packaging (WICG Web Packaging Format, J. Yasskin's drafts, and Web Packaging Format Explainer)
- EPUB Open Container Format (OCF) 3.1 W3C Member Submission
- Zip/OCF
- Concise Binary Object Representation (CBOR) RFC7049
- SQLite
All of these have pros and cons. For example, Web Packaging is not finalised, the CBOR specification precludes inclusion of a general compression scheme (although one could add one on top of CBOR), and SQLite is not a standard of a recognised body.
Actually, I would argue that the PWP specification should not mandate a single packaging format - but instead should address all the requirements for packaging in any valid format. That way, we can have profiles of PWP - such as EPUB4 and NextGen PDF - for specific use cases.
@lrosenthol I do not think it is possible publish a PWP without providing a packaging format, also in light of the fact that the differences between EPUB4 and PWP may become more complex (see, e.g., my comment elsewhere). There is a clear need (eg, for archival purposes, or for private use) for a "profile" (whatever that means) that is 'simply' a packaging of a WP content, without being bound to other restrictions of, say, EPUB4. Such a profile would need a specified packaging format. It would be possible to cut the PWP document into two, but in view of the size of the document (I do not expect these to be long documents) that would be an overkill.
I would think, rather, that the issue should be on how we define PWP conformance (see #12). It should be possible to be conformant with the PWP spec's general sections (whatever they will be), and not with the packaging part.
(There are examples for this. Although the situation is fairly different in other respects, the RDFa 1.1. Core defines, within the documents, several classes of conformance). Some of those are defined within that spec, while others, like HTML+RDFa 1.1 are defined, formally, in separate documents.)
Actually, I would argue that the PWP specification should not mandate a single packaging format - but instead should address all the requirements for packaging in any valid format. That way, we can have profiles of PWP - such as EPUB4 and NextGen PDF - for specific use cases.
Another approach would be to have a default packaging format, which would open the door to other options as well.
The idea of a default, but not mandated, format (which I think both of you are suggesting) sounds like a great approach! +1 to that!
And on that note, to the original question, I would say that whatever we choose needs have a few key requirements:
- is itself an existing (or in development) open standard
- is extensible (without breaking compliance)
- natively supports (or enables) a digital signature
Those to me are the only MUSTs.
Compression and encryption can be layered on top of pretty much anything - they need not be native requirements. But signatures are such that format has to enable it (as some package format designs can actually go against it).
Here's a few more we should consider:
RFC 2557
MIME Encapsulation of Aggregate Documents, such as HTML (MHTML)
- adds
Content-Locationtomultipart/relateddocuments - provides a "root" HTML document
- allows for nested (multi-document) structures
- encrypt-able, sign-able and compressible.
- can be created from Chrome, Opera, and Vivaldi (behind config flags)
- generated by every email client that sends HTML emails
Web Packaging
Essentially a stream-able, Web-centric multipart/related approach
- package-wide headers https://www.w3.org/TR/web-packaging/#h-package-header
- provides a fragment identifier scheme for reference parts as well as referencing "within" parts
http://example.org/downloads/editor.pack#url=/root.html;fragment=colophon - defined list of allowed headers: https://www.w3.org/TR/web-packaging/#part-headers
- potentially still encrypt-able, sign-able, and compressible as above
Also of note is the research done by the TAG on this topic before they wrote the original Web Packaging specification: https://github.com/w3ctag/packaging-on-the-web#rejected-approaches
We might also explore the SLEEP - Syncable Ledger of Exact Events Protocol format in use in the Dat Project.
TAG feedback here should be taken into account in this thread if possible.