specs icon indicating copy to clipboard operation
specs copied to clipboard

WACZ futurism: mimetype and Pronom ID

Open DiegoPino opened this issue 4 years ago • 4 comments

Good morning!

Just an idea. As early adapters of the WACZ format we were thinking that it could be nice in the future to have a specific way we should identify WACZ (before any processing). Requesting a new mimetype to IANA seems a bit out of scope (or not?) but thinking of the data package inheritance that happens in WACZ and based on the extra arguments we we can pass to a mimetype

See : https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types and see https://github.com/frictionlessdata/specs/tree/master/data-package where datapackages' media type is application/vnd.datapackage+json seems to be for the JSON? but not sure about the gzipped version of the full package? (I may be lacking coffee here)

application/vnd.datapackage+json;parameter=value

or

application/zip;parameter=value?

application/vnd.datapackage+zip or +gzip?

parameter = content? value = webarchive?

Too liberal?

Anyhow. Just ideas. On our side we can start by using application/vnd.datapackage+gzip

Pronom goes a bit further by registering file content characteristics. Still important for Digital preservation to have at least some discussion, maybe someone from that realm could give us a hint

Thanks

DiegoPino avatar Dec 03 '20 14:12 DiegoPino

Since effort is now underway to make WACZ into more of a standard I think working towards an IETF media type might actually be a good idea. Registering a media type doesn't require the standard be developed at IETF.

edsu avatar Nov 24 '21 18:11 edsu

Pronom also came up today in another conversation, not entirely sure what is needed for that, but happy to explore if there is interest. Looks like submission form is here: https://www.nationalarchives.gov.uk/PRONOM/submit.htm

ikreymer avatar Dec 03 '21 00:12 ikreymer

I can help there. But would be great to bring frinctionless data package into the discussion too since pronom uses actually file signature for detection and in a zip that is hard to make non generic https://www.nationalarchives.gov.uk/information-management/manage-information/preserving-digital-records/droid/

On Thu, Dec 2, 2021 at 7:11 PM Ilya Kreymer @.***> wrote:

Pronom also came up today in another conversation, not entirely sure what is needed for that, but happy to explore if there is interest. Looks like submission form is here: https://www.nationalarchives.gov.uk/PRONOM/submit.htm

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/webrecorder/wacz-spec/issues/41#issuecomment-985103436, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU7ZZZOACPY7AO3RDH5LPDUPADJRANCNFSM4UMCSZIQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Diego Pino Navarro Digital Repositories Developer Metropolitan New York Library Council (METRO)

DiegoPino avatar Dec 03 '21 00:12 DiegoPino

We're adding wacz detection (maybe parsing?) over on Apache Tika now. As a temporary placeholder at least, is application/wacz appropriate ?

https://issues.apache.org/jira/browse/TIKA-3696

tballison avatar Mar 10 '22 12:03 tballison