stapler icon indicating copy to clipboard operation
stapler copied to clipboard

edit metadata (author, title, bookmarks, ...)

Open darkdragon-001 opened this issue 7 years ago • 4 comments

Please add a replacement for pdftk's dump_data and upate_info to edit bookmarks.

darkdragon-001 avatar Jul 04 '18 11:07 darkdragon-001

I'm not entirely sure if the underlying library supports it. But if it does, this looks like a good first issue for someone.

hellerbarde avatar Feb 19 '20 00:02 hellerbarde

Any news about this ? I'm not refering to the bookmarks part of the question.

I'm wondering about the writing metadata functionality. Staple already supports reading metadata tags:

stapler info print.pdf

*** Metadata for print.pdf
    /Title:  my Title
    /Author:  my Name
    /Keywords:  keyword1;keyword2
    /Subject:  my Subject

The same thing that you can do with:

$ pdfinfo print.pdf

Title:          my Title
Subject:        my Subject
Keywords:       keyword1;keyword2
Author:         my Name

But would writing be possible ? Like for example with exiftool

exiftool -Title="my Title" -Author="my Name" -Subject="my Subject" -Keywords="keyword1;keyword2" mypdffile.pdf

I'm not entirely sure if the underlying library supports it.

The "underlying library" ? Do you mean these two, python-pypdf2 and python-more-itertools ?

m040601 avatar Aug 16 '20 11:08 m040601

I was asking myself, if this is really something one wants to do in the console or is it more likely that one interactively wants to edit single documents. In the latter case, I recommend https://github.com/pdfarranger/pdfarranger (at least for meta data tags).

Having read support already built in, I think write-support with a syntax similar to exiftool would be good (stapler info -title="My Title" print.pdf). Reading single meta tag for further processing could also be implemented (stapler info -title print.pdf should output only My Title).

For bookmarks, there might be the possibility that one can download/extract the table of contents with Title>PageNum. This could be used to add PDF bookmarks correctly with scripts.

darkdragon-001 avatar Aug 16 '20 12:08 darkdragon-001

if this is really something one wants to do in the console

yes, it is, for me at least. Thanks for the tip about pdfarranger. But I already knew about it. As well as every single GUI pdf app on the planet to edit pdf metadata.

Being a CLI app, it is the only reason for me to use stapler. And originally because I didnt want the java dependency of pdftk. And wanted a light, well maintained replacement with the same functionality.

I insist on the "metada" editing functionality. And not on the bookmark management part of the question (not asked by me).

I do this because I think it would be very interesting and usefull for a tool with the goals of stapler. Perhaps if I make it clear why support for editing metada is important for me in a pdf CLI tool , and what I want it for explains it better.

It's mainly about managing large pdf collections.

This need for managing well the metadata of pdfs, arouse in the last years with the need to manage my collection of pdfs. Again, I already know all the GUI apps to manage a pdf collection. That's not what I want.

With the increasing use of ebook readers (Kobo, Kindle) etc, and the huge increase in my pdf collection of many different types and sources, like real books or simple documents, I am faced with the same problem I already solved for my music collection.

And just like with my music collection, I dont expect the system to be perfect. I dont want to obssess about impossible perfect classification and file organization.

I also found that renaming your pdf files or trying to neatly organizing them in folders would never be a sufficient method of organizing.

But I dont want to be tied and forced to a GUI app, or proprietary tool, or a database. I want to use my file system and the tools and scripts I am confortable with for shuffling, copying, moving what I want. Without being tied to a central database.

Just like with music metadata tags (mp3 id3v2 , apple mp4/m4a/aac tags) the standards were never perfect and well documented. Read more about "pdf tags" here: -https://en.wikipedia.org/wiki/PDF#Metadata -https://exiftool.org/TagNames/PDF.html -https://www.linuxuprising.com/2019/07/how-to-edit-pdf-metadata-tags-on-linux.html

And just like with mp3/mp4 files you can never expect to get something consistent, from wherever you downloaded your pdfs or music file. Some put "Author" or "Title" or tag XYZ. Some dont put nothing. Some use XML style tags. Some use DC style tags. Some use pdf version 1.4 some version 1.6 etc.

So it's up to me to do the cleaning and organizing. As said above, I dont want to obssess about this organizing, waste to much time or use complicated tools.

I dont want much. I am satisfied if I can at least make sure all the pdf files in my collection have those simple well supported tags "Title", "Author" "Subject" and "Keywords". Having that part solved, and if you are command line user, I dont need to explain what you can achieve further with simple shell scripts/pipes/batch processing, A simple Unix style solution for searching organizing your digital objects.

m040601 avatar Aug 16 '20 12:08 m040601