komga icon indicating copy to clipboard operation
komga copied to clipboard

Parse PDF Title, Author, Creation date and tags for metadata

Open Azarym opened this issue 2 years ago • 2 comments

Describe your suggested feature

PDF contains metadata information like Title, Author, CreationDate and tag. It could be useful to parse them when analyzing the file and populate the book information with theses.

Other details

I see that the code is using org.apache.pdfbox.pdmodel.PDDocument to parse PDF document. There is a method getDocumentInformation() that can retrieve this information.

Acknowledgements

  • [X] I have searched the existing issues and this is a new ticket, NOT a duplicate or related to another open issue.
  • [X] I have written a short but informative title.
  • [X] I have updated the app to the latest version.
  • [X] I will fill out all of the requested information in this form.

Azarym avatar Sep 29 '23 23:09 Azarym

Duplicate of #277

Biggest hurdle is that most pdf contain really crappy metadata.

gotson avatar Sep 30 '23 00:09 gotson

Sorry for the duplicate, I didn't go far enough in the list of issue.

My understanding is that the crappy metadata come from random document (User manual, Spec). I think that Komga is intended to read book and generally, author of those book try to make those information more clean to keep a signature in it (at least for the Title, Author and ModDate).

I think it can be good to parse those information, and if they are bad we can always edit them inside Komga.

Azarym avatar Sep 30 '23 13:09 Azarym