calibre-web icon indicating copy to clipboard operation
calibre-web copied to clipboard

Feature Request: Polish/Embed metadata after applying metadata edits?

Open rmzg opened this issue 5 years ago • 10 comments

As the title says, if you have access to the calibre tools, use them to attempt to embed the metadata back into the actual ebook file.

This is something I was actually planning on working on but I wanted to know if there was some obvious reason this wouldn't work.

rmzg avatar Sep 04 '18 20:09 rmzg

In general there is no reason why this should not work

OzzieIsaacs avatar Sep 13 '18 06:09 OzzieIsaacs

I would like to try and implement this. @OzzieIsaacs would you prefer to use .opf files for this, like Calibre does, or would embedding the metadata directly suffice? Also, should this happen on download, metadata edits in general or via the GUI, whenever a user requests it?

Dunrar avatar May 30 '22 05:05 Dunrar

I would prefer to use the same files and also the same format as calibre does to be as compatible as possible. But I'm still not sure about this as I have up to now no time to check the behavior of calibre: From calibre docu I'm pretty sure that calibre uses the opf file in parallel to the files to backup the database. In contrast to this some users seem to be very focused on this files and I haven't figured out what they are doing. I understand the whish to change the metadata of the book to reflect the changes which are done in the library. Especially changing the cover. My current experiments are showing that calibre doesn't do this (as far as I got up to now). So from my point of view this are 2 different things: A) Writing the opf file. I see 2 solutions:

  1. As far as I checked up to now calibre has a table to store metadata changes, this table could be used and a background job (similar to the thumbnail on) could generate/update the .opf files on a time basis (the same configuration timestamp as for the thumbnails could be used)

or 2) After every action where a user edits a book the opf file could be updated.

B) Changing the metadata of the files itself. This will be a big topic: Which formats to support? Epub, kepub, pdf, cbz, cbr, cbt, (mobi, what about audio files). If we start with one format somebody will show up and complaint that his format isn't supported. For pdfs we have to consider that the files can be encrypted, there are at least 2 ways on how to embed metadata. I don't know if epub files are easier, there are several ways on how to store a cover and if just replacing the coverfile works I'm not sure (different cover resolutions?)

Anyhow: If you create a pull request please also take some time to write some tests and create a PR for the test repository (https://github.com/OzzieIsaacs/calibre-web-test)

OzzieIsaacs avatar May 30 '22 18:05 OzzieIsaacs

Okay, regarding A), I'll spend some time on finding out how Calibre deals with .opf files and metadata changes in general. Then we can more easily decide if we want to do it the same way or not. When it comes to B), I feel like any support is better than none, and people are complaining that there is no support right now. So how about starting with the most used or easiest ones to implement and incrementally add the others?

Dunrar avatar May 31 '22 10:05 Dunrar

Regarding A) excellent

So how about starting with the most used or easiest ones to implement and incrementally add the others?

Yes, okay, in my opinion this would be pdf and epubs

OzzieIsaacs avatar May 31 '22 17:05 OzzieIsaacs

I read a bit in the calibre manual. New idea: Instead of starting the adventure to implement "applying metadata to an ebook by calibre-web", we should start with using calibre itself (https://manual.calibre-ebook.com/generated/en/calibredb.html#embed-metadata)

OzzieIsaacs avatar Jun 04 '22 04:06 OzzieIsaacs

I had some time to look into the Calibre source, so I'll just dump some info first:

On changes to book metadata, the update is made to the database first, and from there to the .opf files. When it comes to embedding metadata, it is as you said, every format has very different capabilities when it comes to storing metadata, and has to be handled seperately.

Calibre has a few operations that embed metadata. Basically everything that moves files outside of Calibre and a few manual ones:

  • Save/Export
  • Send
  • Polish
  • Embed Metadata
  • Conversion (sometimes, more on that later)

I only looked at conversion and the manual ones (polish and embed_metadata), but save/export and send probably reuse the same code. As far as I can tell, embed_metadata embeds the metadata into the files directly from the DB (see embed_metadata() and get_metadata() in cache.py). Some MetadataWriterPlugins may have options that change this default behavior, I did not look at all of them. This would mean that the .opf files are not strictly necessary from a compatability standpoint. Conversion hides an exception, which is also why it won't work for Calibre-Web.

By default Calibre seems to create a temporary .opf metadata file with information from the book file itself, which can be overwritten by information from a user-specified .opf file. Then, the right input plugin creates an OEBBook from the input file. After that the metadata from the .opf gets merged into the OEBBook which is then converted to the ouput format by the output plugin. If called by the GUI the converter uses a second temporary .opf file with current DB data to update the information from the book file. If ebook-convert is being called from the command line, then 'read_metadata_from_opf=None'. It's an edge case, because Calibre explicitly claims to only change book files if asked to or when they leave Calibre, but these files are being newly created.

I hope there is something useful in there.

Dunrar avatar Jun 05 '22 10:06 Dunrar

I read a bit in the calibre manual. New idea: Instead of starting the adventure to implement "applying metadata to an ebook by calibre-web", we should start with using calibre itself (https://manual.calibre-ebook.com/generated/en/calibredb.html#embed-metadata)

Like with file conversion? But that would make having Calibre installed a requirement, right? Anyway, it's probably a good idea. On the other hand, embedding seems to be substantially easier than conversion, so it might be feasable with some "inspiration" from the Calibre source.

Dunrar avatar Jun 05 '22 10:06 Dunrar

So, I've started working on this, but there are a few options:

  1. We could simply provide an option to always embed metadata in the original files whenever there is a change. That's where calibredb embed-metadata would be a good fit.
  2. If we don't want to touch the original files, we could use
  • calibredb export --dont-write-opf, with the needed formats specified.
  • ebook-meta --from-opf to update a temporary copy of a book file with an OPF and send/download/... that.

I think ebook-meta --from-opf only really makes sense with OPF files implemented, which should be done in Calibre-Web itself. But theoretically, OPF files could be created with calibredb show_metadata --as-opf id, calibredb export and calibredb backup_metadata as well.

Right now I'm going the calibredb export route. Any objections?

Dunrar avatar Jun 19 '22 12:06 Dunrar

Right now I'm going the calibredb export route. Fine for me, please only keep an eye on calibre runtime (hopefully) no problem. Additionally I agree to generate the opf backup files generated by calibre-web itself, to have also a backup of the metadata without calibre installed

OzzieIsaacs avatar Jun 20 '22 11:06 OzzieIsaacs