typst icon indicating copy to clipboard operation
typst copied to clipboard

Embed source code in PDF

Open realpixelcode opened this issue 2 years ago • 4 comments

Description

Similarly to draw.io, there should be an option to embed the source code in the compiled PDF file, so that the code wouldn't be lost.

Use Case

You'd be able to easily pass around editable documents, sending just the PDF file (instead of PDF + TYP). Also, when given a PDF file whose layout you really like, you'd be able to use it as a template if it contains the source code.

realpixelcode avatar Jun 10 '23 23:06 realpixelcode

What would this look like in terms of multiple file support?

All code expanded inline or headings with a file path then the code in that file?

cook-f avatar Jun 17 '23 13:06 cook-f

What would this look like in terms of multiple file support?

All code expanded inline or headings with a file path then the code in that file?

I agree! 👍

rthandi avatar Jun 17 '23 13:06 rthandi

This was mentioned in https://forum.typst.app/t/is-it-possible-to-store-the-source-in-the-generated-pdf/564/2.

Andrew15-5 avatar Sep 22 '24 13:09 Andrew15-5

What would this look like in terms of multiple file support?

I think, it should work in archive-like way:

  • The typst compile command (probably with some flag like --embed-source) should remember the whole file structure from the project root directory and write it to pdf's metadata. Only files used in the compiled file should be included.
  • The typst extract command should recreate the whole file structure in the current directory or in the directory specified with some flag.

See example bellow for more details:

$ tree
.
|-- res
|   └── logo.jpg
`-- src
    |-- body.typ
    |-- main.typ
    `-- unused.typ

$ cd src

$ cat main.typ

#image("/res/logo.jpg")
#include "body.typ"
// NOTE: unused.typ is not used here

$ typst compile --root .. --embed-source main.typ

$ cp main.pdf /path/to/some_other_dir

$ cd /path/to/some_other_dir

$ typst extract main.pdf

$ tree # NOTE: unused.typ is not here

.
|-- main.pdf
|-- res
|   `-- logo.jpg
`-- src
    ├-- body.typ
    `-- main.typ

kotfind avatar Sep 23 '24 16:09 kotfind

qpdf --add-attachment test.typ -- test.pdf new.pdf attaches a typst source to make the PDF so it can be modified. Kinda like LibreOffice Hybrid PDF. qpdf --list-attachments new.pdf test.typ -> 78,0

then just compile a new PDF once you modify the test.typ typst compile test.typ --open

yes the text typst file gets compressed in the PDF

Extract attachment by opening PDF with Firefox / Librewolf and clicking on attachment icon then clicking on typst file to save it to harddisk.

Image

mrfragger avatar Feb 04 '25 11:02 mrfragger

qpdf --add-attachment test.typ -- test.pdf new.pdf attaches a typst source to make the PDF so it can be modified. Kinda like LibreOffice Hybrid PDF. qpdf --list-attachments new.pdf test.typ -> 78,0

Great! It even supports multiple attachments. The key of an attachment could be set to its containing directory relative to the Typst project folder.

realpixelcode avatar Feb 04 '25 12:02 realpixelcode

The native workaround:

#pdf.embed("file.typ")
Check PDF attachments.
typst c file.typ

https://github.com/typst/typst/pull/5221

Andrew15-5 avatar Feb 04 '25 17:02 Andrew15-5

ah will try that the native way in a second. Wasn't aware of it at all.

If one needs to use command line for batch you can use this pypdfextractattachments.py

python3 pypdfextractattachments.py some.pdf

then for batch for f in *.pdf ; do python3 pypdfextractattachments.py "$f" ; done

import os
import sys
from pypdf import PdfReader

def extract_attachments(pdf_path):

    try:
        reader = PdfReader(pdf_path)

        if not reader.attachments:
            print("No attachments found in the PDF.")
            return

        pdf_directory = os.path.dirname(pdf_path)

        for name, content_list in reader.attachments.items():
            for i, content in enumerate(content_list):
                attachment_filename = os.path.join(pdf_directory, f"{name}-{i}")
                with open(attachment_filename, "wb") as fp:
                    fp.write(content)
                print(f"Saved attachment: {attachment_filename}")

    except FileNotFoundError:
        print(f"Error: The file '{pdf_path}' was not found.")
    except Exception as e:
        print(f"An error occurred: {str(e)}")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python script.py <path_to_pdf>")
        sys.exit(1)

    pdf_path = sys.argv[1]
    extract_attachments(pdf_path)

mrfragger avatar Feb 04 '25 23:02 mrfragger

ok had to install rust dev version works well and compression is almost as good typst embedded 563KB compared to 547KB qpdf embedded. The typ file is 225KB with original PDF being 497KB. Will test later on a typ file 5x bigger. 20 hour transcription rather than 4 hour transcription.

I'd say only thing discouraging me from using this way would be it seems impossible to specify filename to be that of the document automatically. Could this be added a cli argument so we can pass the filename variable?

mrfragger avatar Feb 05 '25 00:02 mrfragger

I'd say only thing discouraging me from using this way would be it seems impossible to specify filename to be that of the document automatically.

This is why this issue is still open.

Andrew15-5 avatar Feb 05 '25 01:02 Andrew15-5

this should work...will test soon

read -p "Press ENTER to compile PDF with typst" echo '#pdf.embed("$completetitle".typ)' > "$completetitle"_new.typ cat ~/.config/mpv/extrastuff/bashscripts/pdftemplate.typ >> "$completetitle"_new.typ cat all.md | sed -E -e 's/*{2,8}/\\\*/g' -e 's/$/\$/g' -e 's/#/\#/g' -e "s/`/'/g" -e 's/@/\@/g' >> "$completetitle"_new.typ echo "" echo "make proper centered Chapter headers" read -p "Press ENTER to split headers in half (if over 44 characters) with a \ (backslash) " gawk '/^= [0-9]+ / && length($0) > 44 {n = int(length($0)/2); for (i = n; i > 0; i--) if (substr($0, i, 1) == " ") break; print substr($0, 1, i) "\ " substr($0, i+1); next} 1' "$completetitle"_new.typ > "$completetitle".typ rm "$completetitle"_new.typ typst compile "$completetitle".typ --open

as compared to read -p "Press ENTER to compile PDF with typst" cat ~/.config/mpv/extrastuff/bashscripts/pdftemplate.typ > "$completetitle"_new.typ cat all.md | sed -E -e 's/*{2,8}/\\\*/g' -e 's/$/\$/g' -e 's/#/\#/g' -e "s/`/'/g" -e 's/@/\@/g' >> "$completetitle"_new.typ echo "" echo "make proper centered Chapter headers" read -p "Press ENTER to split headers in half (if over 44 characters) with a \ (backslash) " gawk '/^= [0-9]+ / && length($0) > 44 {n = int(length($0)/2); for (i = n; i > 0; i--) if (substr($0, i, 1) == " ") break; print substr($0, 1, i) "\ " substr($0, i+1); next} 1' "$completetitle"_new.typ > "$completetitle".typ rm "$completetitle"_new.typ typst compile "$completetitle".typ --open sleep 6 qpdf --add-attachment "$completetitle".typ -- "$completetitle".pdf new.pdf mv "$completetitle".pdf "$completetitle"_no_typ.pdf mv new.pdf "$completetitle".pdf mv "$completetitle".pdf ../source

mrfragger avatar Feb 05 '25 01:02 mrfragger

Image

typst compression almost on par with qpdf for embedding typ files

mrfragger avatar Feb 05 '25 07:02 mrfragger

Maybe this should be by default? Or would the file size increasing be to big of an issue?

enkvadrat avatar Sep 10 '25 15:09 enkvadrat

Maybe this should be by default? Or would the file size increasing be to big of an issue?

Having fully reproducible PDFs from source would be incredible. Perhaps to save size, the same resources used for e.g. images don't have to be included twice. While the original image resolution may be higher than the one used for the output PDF, just reusing the already embedded assets could help with the file size.

mewmew avatar Sep 10 '25 15:09 mewmew

Doing this by default would be highly surprising to me as a user. That's as if rustc embedded my source code in the binary. In my opinion, we shouldn't assume that the user wants to publish their sources.

laurmaedje avatar Sep 11 '25 06:09 laurmaedje

I'd say only thing discouraging me from using this way would be it seems impossible to specify filename to be that of the document automatically.

This is why this issue is still open.

The way I understand this is: if I embed the whole directory as an attachment I won't be able to tell which file I should pass to typst compile to build it again.

I feel like if your file structure is coherent it will be pretty easy to retrieve which file is the document's, in most cases you wouldn't have to search through a lot of files anyway, and if it is really that big of deal you could write the path somewhere in the directory like in a readme file (and if your project is very complex, you should have some sort of build system, which would require you to define an entrypoint anyway).

I think the main concern of many people who want that feature is to not lose the source code and many would be happy with a solution that embeds a source.zip which contains Typst's root folder and a typst extract command that fetches the attachment and extracts it somewhere. Sure having a way to tell which file is the entrypoint would be great, but I would delegate it to a follow-up issue.

marie-bnl avatar Nov 27 '25 09:11 marie-bnl