HagenbergThesis
HagenbergThesis copied to clipboard
Create PDF/A documents
New efforts on archiving and publishing theses (throughout the whole FH OÖ) have brought up the requirement that archived theses should be in PDF/A format to prevent modification. We should therefore update the PDF generation process to make this possible with the template.
I suggest adding an additional package parameter since it is probably not always desired to create a PDF/A.
It has not been decided which PDF/A standard will be required (there are PDF/A-1 to PDF/A-4 with different subtypes (a, b, c, etc.). We should probably check which ones are feasible and which ones aren't (the accessible versions, e.g., PDF/A-1a, will be problematic with LaTeX as far as I can tell). There might still be potential to influence the final decision if we bring up valid technical requirements/limitations.
@rru-hgb, if you have updates or input on the requirements, please let us know.
@hochleitner
I tried an initial setup for creating PDF-A files in a new branch pdf-a
(commit daddd5350ae4d23c895a557b3eeca873c70dfd5e), for document HgbThesisTutorialEN
. It uses the pdfx
package, creating PDF/A-2b
files. Most things went smoothly, surprisingly all figures ran through without problems, except for the following:
- The
hyperref
setup needs to be changed slightly, I deactivated warnings that are unavoidable. - The
\euro
character has a problem with dimensions, I deactivated it (needs to be fixed). - The included
fragebogen.pdf
was not compliant, I replaced it with a fixed version. - I set the PDF minor output version to 7 to stop some warnings with included figures.
One not so elegant aspect is that the PDF metadata need to be contained in a separate file (main.xmpdata
), which is created in the preamble of main.tex
(before \begin{document}
). The associated entries currently cannot be filled in automatically from the author/title definitions, since these are defined later. Can this be changed?
Some relevant links:
- https://mirror.easyname.at/ctan/macros/latex/contrib/pdfx/pdfx.pdf
- https://webpages.tuni.fi/latex/pdfa-guide.pdf
- Online PDF validator: https://www.pdf-online.com/osa/validate.aspx
One should also look into hyperxmp
as an alternative (see https://tex.stackexchange.com/questions/150221/pdfx-package-leads-to-non-working-hyperref-links).
I'll have a look at it asap. I've read up on the topic as well, and it seems a whole bunch of things is involved. Metadata, color intents, oof.
I've added a new Release 2024 milestone, where I'll add all the relevant issues for next year's release. I think a 1-year release cycle with a target of the end of February might work well. If necessary, we can always add a fall release.
Great. In the meantime I checked hyperxmp
-- unfortunately, it does not work as expected and I have no idea how to fix it.
However, both hyperxmp
and pdfx
may be obsolete anyway, because some recent additions to LaTeX itself make PDF-A creation a lot easier. I am currently testing this, looks good sofar ...
Just pushed a new and IMO much better variant of PDF/A generation in branch pdf-a-l3. It is based on the forthcoming LaTeX kernel functions for PDF management and extremely simple to use. The only caveat is that Overleaf has no recent version of the pdfmanagement-testphase
package (version 0.95s or higher is needed) and thus is not compliant yet. But this will change and I think this is the right path to go.
- I had to fix some minor issues with
hyperref
(what else?) but all in all the setup works as before. - The
eurosym
Euro symbol is corrupted (font metric error), replaced the package bymarvosym
. - There is a new package
hgbpdfa.sty
with only a few lines now, but I thought is would be easier to maintain if anything else (color profiles etc.) needs to be added later. -
hgbpdfa.sty
needs to be loaded before the\begin{document}
command, i.e., before anything is written to the PDF! It is thus not possible to use a document option. Instead, users must comment out a single line if they do not want PDF/A compliance. - Generally, I think we should make PDF/A the default in all documents, I see no reason who not to use it.
- I added a section in the tutorial (EN only) under "Printing", including hints for validating PDFs. Also added stuff to the manual.
@hochleitner Pls. look at it carefully. If adopted, what else remains to do:
- Copy all style/class files from
TutorialEN
todev
. - Update the German tutorial, including new screenshot.
- Add PDF/A to other documents.
Here is useful link: https://ctan.org/tex-archive/macros/latex/contrib/pdfmanagement-testphase
The remaining points are completed (translated to TutorialDE
, files copied to dev
.
All documents (except the article) are now set up to produce PDF/A.
Note: The report
-based docs had to be modified (author/title definitions moved before \begin{document}
to avoid hyperref
errors). Perhaps this should be looked into again.
Made a full rebuild, validated all PDFs. Checked with Overleaf (currently throws a warning, no PDF/A is created).
TODO: mention PDF/A in README, add link to online validator.
I fixed files hgbarticle.cls
and hgbreport.cls
to allow author/title declaration after \begin{document}
, by adding the hypersetup
to the maketitle hook. All affected documents were reverted. Also, I moved \RequirePackage[utf8]{inputenc}
to the top of all documents.
Removed remaining ocurrances of \citenobr
.
Added a short section on PDF/A and links in README.md
.
Plus another full rebuild. Everything looks good now, pls. check the PR.
@hochleitner We should do a repo cleanup soon! It currently has ca. 650MB (IMO too big for cloning). Just tried: it can be reduced to 54MB by removing old PDF and ZIP files.
Okay, it took me ages to finally check it all out - sorry for the huge delay.
Here are my thoughts:
- First of all, thanks for all the experiments; that's quite some stuff to read.
- I agree; we should go with the L3 features. It is the most promising, simple, and future-proof version.
- The
pdfmanagement-testphase
issue with Overleaf is problematic, but I think we could add a recent version to ourlatex-foreign
folder for now so that projects on Overleaf have a current version. This package changes quite often (we're at 0.95x now), but having a minimum version present would solve the issue on Overleaf. A new TeXlive release will happen in late summer, and having the main branch not working on Overleaf is something we most definitely don't want. - Yes, we should make PDF/A the default and not even give people the option to choose. I see no obvious downside except that people might have to fiddle with included graphics. But if we provide an easy way to turn this off, people will turn it off to make it easy. We can give some tips in the wiki on how to deal with included files.
- The
eurosym
issue is funny, considering that we used to havemarvosym
(I just removed traces of it while reworking the tutorial documents), and now we're back again. 😬
I wonder how much effort it is to reach PDF/A-2a. Because proper accessibility is something we need to tackle sooner or later, maybe the l3 functions will improve this workflow too.
In the latest commit (b6d93dc4d16d1c5ca259716067caa9a0575a7608) I tested the idea of making local copies of the recent pdfmanagement-testphase
files inside the project directory. It requires the following 8 files:
pdfmanagement-testphase/pdfmanagement-testphase.sty
pdfmanagement-testphase/pdfmanagement-testphase.ltx
pdfmanagement-testphase/l3backend-testphase-pdftex.def
pdfmanagement-testphase/pdfmanagement-firstaid.sty
pdfmanagement-testphase/l3ref-tmp.sty
tagpdf/tagpdf-base.sty
l3experimental/l3bitset/l3bitset.sty
l3backend/l3backend-pdftex.def
latex-lab/documentmetadata-support.ltx
This works locally as expected. Then uploaded a zipped version to Overleaf, which immediately complains that some of the files cannot be parsed(!). The resulting output is PDF/A (at least Acrobat Reader says so, not tested for actual compliance) but contains garbage before the document starts.
In summary, I do not think this a viable option. The project directory is messed up and it still does not work. We should wait for Overleaf to update their LaTeX environment.
So, I deliberately waited this long to add something to the issue of PDF-A creation (cough, cough), but at least there is good news. Overleaf now includes pdfmanagement-testphase
0.95x, so the PDF-A creation runs through fine for me.
I uploaded the HgbThesisTutorialDE
folder from the #160 PR, producing a valid PDF-A 2B without errors. Both Acrobat and online PDF-A validators confirm it.
So, it seems we should be able to merge #160 and make PDF-A generation a default in the main
branch. Could you please test this again yourself just to make sure I did not overlook something? It is a significant change after all.
Looks good! I tested the PDF/A setup with the most recent MikTeX update and also on Overleaf. I suppose we can merge the pdf-a-l3 branch ...