rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

External metadata files

Open Telkhine opened this issue 3 years ago • 10 comments

I don't know if this project is concerned with where the metadata file is stored, i.e. like how ComicInfo.xml is stored inside a compressed comic archive. If the project is, I would propose support for external metadata files.

An external metadata file would look something like this:

Comic File Metadata File
Batman 001 (1940).pdf Batman 001 (1940).xml

Here are a few reasons why this would be good:

  1. Not all comics are archives. Some are epub, pdf etc
  2. Editing a metadata file within an archive is generally not an atomic write operation. Data corruption of the comic file becomes a higher risk. It's better a metadata file become corrupt than a comicbook file.

Some projects which solved this problem in a similar way:

Telkhine avatar Nov 04 '20 20:11 Telkhine

This is indeed an axis of research and discussion, but a bit early given the data model is not finalized.

Offering multiple options is something I had in mind, either embedding the file in the archive where possible, or as a sidecar if not.

gotson avatar Nov 04 '20 23:11 gotson

I think calibre just uses the epub standard.

shimizurei avatar Dec 15 '20 13:12 shimizurei

While I normally hate external metadata like that, its kinda necessary for pdfs, I don't use them cause they cant integrate with comicvine, but I would love to use them for things like comics from humble bundle, where the pdf's are of a much higher quailty than the cbz's, and converting them to cbz while getting everything right is a long, annoying process.

Bitwolfies avatar Feb 07 '21 21:02 Bitwolfies

For what it's worth, PDFs do support embedded XML metadata files via XMP (see section 1.6.1 in this document). Whether any applications will read the metadata is another story, but at least this is a possible solution that doesn't require a sidecar file.

timgilbert avatar Feb 15 '21 20:02 timgilbert

Even if the solution ends up having a way to include metadata within the archive (which would be ideal), I think it would be good for the specification to also support external files as a fall back (probably as first described above). This would make it much easier to produce software that can add/modify archive metadata.

My reasoning behind this is to make it as simple as possible to work with archive metadata. Its takes a lot of know how and/or frameworks etc to work with PDF, for example, and is probably beyond what many simple apps/scripts/tools would want to get into. If someone wants to write a script/tool to work with archive metadata, but doesn't have the time/effort/experience/capability to work with something more complicated that a zip file (i.e .pdf or .epub files), at least they have the option of putting the metadata into an external file.

And, while it probably goes without saying, we should definitely avoid anything even remotely proprietary. It annoys me greatly that people continue to release archive as .cbr files, when there is no real benefit over .zip (minor space savings), and pretty much any software/script can find a way to work with zip files. Much less so for RAR files. Because of this I personally convert .cbr to .cbr when I get them.

wyldphyre avatar Mar 26 '21 08:03 wyldphyre

We have not yet reached the stage of implementation design for the exported format, but one thing that is important is to separate the model, the data format, and the container:

  • the model is a collection of objects, a relational model. Probably different from the one shown on the main page, because the exported model will be slightly different from the source of truth model (we may need to flatten it a bit).
  • the data format is a technical representation of the model. It could be XML, JSON, Avro, ProtoBuf, to name a few. That's a serialization format. They are not all equal, and each has pros/cons that will need to be evaluated.
  • the container is what holds the data. Common ones are simple file, but there's also zip headers, or PDF information maybe. The file could be inside the archive, or as a sidecar with the same name. This is quite flexible, but would need to be documented, so clients don't have to handle too many cases.

gotson avatar May 28 '21 02:05 gotson

Given that both ePub and PDF both have embedded xml metadata wouldn't a similar strategy be the right way here? I have a feeling that the layman user would discard external metadata files not understanding the files are linked even with an exact same basename.

cmargroff avatar Feb 02 '22 17:02 cmargroff

Yeah, use of an external file would probably the last thing you’d want to do, in an ideal world. But if you lack the tooling/tech to be able to modify a particular file with metadata (pdf probably being a good example), it might be nice to have the option.

wyldphyre avatar Feb 03 '22 05:02 wyldphyre

Also, as I said, editing file directly is generally not an atomic write operation. Data corruption of a comic file becomes a higher risk. Some people might prefer not modifying the original source to preserve data integrity. The option of eternal metadata is a very nice option.

Telkhine avatar Feb 03 '22 05:02 Telkhine

The ComicBookInfo format took a unique approach in that they added their metadata to the zipfile comments. I thought this was genius until it became clear that updating zipfile comments also requires decompressing and recompressing the entire archive.

It seems like a text file embedded int the (hopefully not very) compressed archive is the best solution.

Ideally the text and metadata files would be the only compressed assets in the archive and the image files would be STORED to speed opening and recompressing the archive. zip (or rar or lzma) compression is unlikely to improve upon image specific compression schemes like jpeg, webp and png.

ajslater avatar Feb 03 '22 19:02 ajslater