ambuda icon indicating copy to clipboard operation
ambuda copied to clipboard

v0 pdf rendering

Open skmnktl opened this issue 2 years ago • 16 comments

Tasks to be done:

  1. cleaning up the template itself
  2. build out the RESTful endpoints for getting the pdf
  3. much more testing. I'm not sure this will work for all the texts we've got. I'm making a lot of assumptions here.
  4. font selection
  5. deployment details-- (including latex packages, etc)
  6. multiple templates; perhaps an alternate template with maybe 3 verses at most per page, so i can annotate while i translate

skmnktl avatar Sep 22 '22 01:09 skmnktl

@shreevatsa @akprasad Let me know if you have any suggestions for improvement. As @akprasad knows, I'm rather new to the building things business; most of my past work has been analytics, so I'm happy to get feedback!

skmnktl avatar Sep 22 '22 01:09 skmnktl

Thanks, this is a start!

One way to put off thinking about deployment etc is to simply generate a .tex file that can be downloaded, and leave turning it into a PDF (by running xelatex or whatever) to the user: this won't be as useful to most users as having an actual PDF, but it's a start, and keeps the initial problem minimal.

In fact, we could put it off for quite a while: could simply generate the .tex and .pdf files offline (doesn't have to run as part of Ambuda), and have them stored in some repo / S3 / whatever.

shreevatsa avatar Sep 22 '22 16:09 shreevatsa

Ah yes, that makes sense w.r.t. deployment. I think we can think about the S3 bucket + Overleaf API to do the compilation. But that's for later. I'll start testing this weekend. See if I can get a good number of the text we have up working.

skmnktl avatar Sep 22 '22 18:09 skmnktl

Ohhh I didn't realize jinja could be used for this. Thank you!

skmnktl avatar Sep 23 '22 00:09 skmnktl

@shreevatsa I've converted the code to a jinja template.

  1. How do I suppress those god awful halting latex compile messages? I've tried xelatex and lualatex. Couldn't figure it out...
  2. I support two of the texts on ambuda now. kumarasambhava and the śivopanishad. There's so much variation in texts. Right now I'm just writing a bunch of try escapes to get the fields to parse. Not sure if that's sustainable...

skmnktl avatar Sep 30 '22 01:09 skmnktl

Should I focus on:

  1. adding texts, or
  2. getting the pdf to look nicer?

skmnktl avatar Sep 30 '22 01:09 skmnktl

@skmnktl About supporting more texts v/s getting the pdf to look nicer, it's up to you (whatever you're more motivated to do), though if you're neutral between the two options, then I'd say that getting the PDF to look nicer can be done indefinitely, after a first version gets in. Also consider a third option, that of cleaning up the PR for production.

I could think of a sequence of PRs like, say:

  • A Python script that takes in an XML file and generates a .tex file (OK if it doesn't look great or works only on some XML files).
  • Extending that script to more files.
  • Offline tasks for generating PDF files by running the scripts, and storing those PDF files somewhere.
  • A route in the app to point to / download the corresponding (previously generated) PDF file, given a text.

But could also think of doing it in other orders.

BTW what did you mean by "halting latex compile messages"?

shreevatsa avatar Sep 30 '22 03:09 shreevatsa

@shreevatsa For the tasks outlined:

A Python script that takes in an XML file and generates a .tex file (OK if it doesn't look great or works only on some XML files).

This is what make_latex.py does. I'm writing the resultant .tex file as output.tex.

Offline tasks for generating PDF files by running the scripts, and storing those PDF files somewhere.

Do we want to wait on this? Should I use AWS? Or perhaps a Google Cloud Storage Bucket? Or perhaps a mongo instance? If the main app will move to a more permanent storage, I can just piggyback off that. If you have preferences on what that storage will be in the future, we can do that as well.

A route in the app to point to / download the corresponding (previously generated) PDF file, given a text.

I'm hesitant to write this since the storage decisions have yet to be made. We could go the render on the fly every time route, and for that I can make a FastAPI endpoint. There's not that many texts right now anyway.

skmnktl avatar Sep 30 '22 11:09 skmnktl

@skmnktl Hi, to be clear, by "sequence of PRs" I meant doing each of those things in a separate PR (so to answer your latter two questions, yes definitely wait on storing in the cloud, and don't write the app now) — I was actually talking about removing stuff from this PR (or moving to another one), to keep it minimal and ready to merge. :-) So this PR would have only 3 files:

  • make_latex.py
  • parser.py
  • template.tex

if I understand correctly. (Because output.tex is generated by make_latex.py, and output.pdf is generated by running xelatex/lualatex on output.tex, and tufte-handout.cls / tufte-common.def are already part of TeX Live so don't need to be committed.)

In the meantime I'll download and compile output.tex on my computer, so that I can see what error messages you were talking about…

shreevatsa avatar Sep 30 '22 13:09 shreevatsa

@skmnktl I looked into output.tex locally. I see what you mean about the errors — there are indeed a lot of them, and even if hitting Enter (or putting TeX in batchmode) will proceed with those errors ignored, it's not a good state to get into :)

Let's start with a template where we understand why each line is needed, and which compiles without any errors or warnings, and add things as needed (e.g. the current template has a line or two of math-related stuff that I'm sure we won't need for a long time if ever).

The file compiles fine without any errors or warnings if I replace lines 1–53 of output.tex (or template.tex originally) with the following:

\documentclass{article}

\title{Kumārasaṃbhava}

\author{apauruṣeya}

\usepackage[parfill]{parskip}
\usepackage{fontspec}

\newenvironment{sanskrit}{}{}

or, if you want to keep tufte-handout (from looking around online, it seems to be not a very well-maintained package and has quite a few bugs unfortunately…), this works:

\documentclass[nobib]{tufte-handout}

\title{Kumārasaṃbhava}

\author{apauruṣeya}

\usepackage[parfill]{parskip}
\usepackage{fontspec}

% Fix for bug: see https://tex.stackexchange.com/q/200722
% and https://github.com/Tufte-LaTeX/tufte-latex/issues/64
% Set up the spacing using fontspec features
\renewcommand\allcapsspacing[1]{{\addfontfeature{LetterSpace=15}#1}}
\renewcommand\smallcapsspacing[1]{{\addfontfeature{LetterSpace=10}#1}}

\newenvironment{sanskrit}{}{}

Feel free to add your font choices back in later.

There were also some errors from Babel's Sanskrit setup; we don't seem to need it for this output.tex, and we can also consider using polyglossia instead of babel. (Sorry I haven't looked into the Devanagari version yet; I didn't run make_latex.py and just tried working with the output.tex that was already in the PR.)

shreevatsa avatar Sep 30 '22 14:09 shreevatsa

@shreevatsa I took your comments above and did a few things with my last commit:

  1. I removed the tufte package entirely.
  2. I removed a bunch of vestigial functions.
  3. I changed the template block and variable markers for jinja.
  4. I have a script in src now that toggles pdf creation, and then moves the pdf to a render_pdf/pdf_outputs titled <text_title>_<author>.pdf.
  5. And all the intermediate files are deleted (including the tex file).
  6. For now, I've suppressed the "sanskrit" environment that I had earlier for babel. But once we start supporting documents transliterated into other scripts, we can set those up I think.
  7. The runscript now uses latexmk rather than xelatex, though it does still call the xelatex engine in the background.

skmnktl avatar Oct 02 '22 02:10 skmnktl

@skmnktl Great! I still see the tex_files with output.tex etc in the PR; could you take another look? Maybe they didn't get removed properly…

shreevatsa avatar Oct 02 '22 03:10 shreevatsa

By the way, if it helps, feel free to consider splitting this PR further, and checking in just one of the two:

  1. just the parsing code to get "data" out of the TEI XML file,
  2. just the LaTeX template and build scripts, with clear documentation on what data it needs as input.

(E.g. it's ok to merge just (2) for now, and work on (1) in parallel.)

shreevatsa avatar Oct 11 '22 05:10 shreevatsa

Returning to this PR -- I suggest we scope this PR down to the simplest thing that adds new functionality then iterate on it with follow-up PRs.

akprasad avatar Apr 07 '23 01:04 akprasad

Do you want to meet this weekend to finalize this? I can wrap it up in the next couple of days?

On Thu, Apr 6, 2023 at 9:32 PM Arun Prasad @.***> wrote:

Returning to this PR -- I suggest we scope this PR down to the simplest thing that adds new functionality then iterate on it with follow-up PRs.

— Reply to this email directly, view it on GitHub https://github.com/ambuda-org/ambuda/pull/364#issuecomment-1499824428, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2BJBATJKU2CSUW6P4PON3W75VDFANCNFSM6AAAAAAQSTANOE . You are receiving this because you were mentioned.Message ID: @.***>

skmnktl avatar Apr 07 '23 01:04 skmnktl

Do you want to meet this weekend to finalize this? I can wrap it up in the next couple of days? On Thu, Apr 6, 2023 at 9:32 PM Arun Prasad @.> wrote: Returning to this PR -- I suggest we scope this PR down to the simplest thing that adds new functionality then iterate on it with follow-up PRs. — Reply to this email directly, view it on GitHub <#364 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE2BJBATJKU2CSUW6P4PON3W75VDFANCNFSM6AAAAAAQSTANOE . You are receiving this because you were mentioned.Message ID: @.>

Sure, just scheduled a weekly sync (see #general on Discord) or we can find some time elsewhere

akprasad avatar Apr 07 '23 01:04 akprasad